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(57) Abstract 

The bacterial serine protease, subtillsin BPN\ has been mutated so that it will efficiently and sdtrivtty dwt 
basic residue.. Combination mutants, where Asn 62 was changed to Asp. Gly 166 was changed tc ►Asp <N62I>G16®). md op^onally 
Tyr 104 was changed to Asp had a larger than additive shift in specificity toward substrates containing basic residues. Suitable substrates 
of the subtillsin variants were revealed by sorting a library of phage particles (substrate phage) containing five contiguous randomsed 
residues. This method identified a particularly good substrate, Asn-Leu-Met-Arg-Lys- (SEQ ID NO: l\^™*^^ n clcMV * * 
the context of a fusion protein by the N62D/G166D subtiUsin variant. A particularly good substrate for N62DTC166tt/Y104p would be 
Asn-Anj-Met-Arg-Lys- (SEQ ID NO: 76). Accordingly, these subtillsin variants are useful for cleaving fusion proteins with basic substrate 
linkers and processing honnones or other proteins in vitro or in vivo that contain basic cleavage sites. 
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SUBTIL1SIN VARIANTS CAPABLE OF CLEAVING SUBSTRATES 
CONTAINING BASIC RESIDUES 



FIELD OF THE INVENTION 

This invention relates to subtilisin variants having altered specificity from wild-type subtilisins. 
5 Specifically, the subtilisin variants are modified so that they efficiently and selectively cleave substrates 
containing basic residues. The invention further relates to the DNA encoding these novel polypeptides, as well 
as the recombinant materials and methods for producing these subtilisin variants. In a particular aspect the 
present invention provides for processes for cleaving protein substrates containing basic residues. 

BACKGROUND OF THE INVENTION 

1 o She-specific proteolysis is one of the most common forms of post-translational modifications of proteins 

(for review see Neurath, H. (1989) Trends Biochem. ScL, 14:268). In addition, proteolysis of fusion proteins in 
vitro is an important research and commercial tool (for reviews see Uhlen, M. and Moks, T. (1990) Methods 
EnzymoL, 185:129-143; Carter, P. (1990) in Protein Purification: From Molecular Mechanisms to Large- Scale 
Processes, M.R. Landisch, R.C. Wilson, CD. Painton, S.E Builder, Eds. (ACS Symposium Series 427, 
15 American Chemical Society, Washington, D.C.\ Chap. 13, p.181-193; and Nilsson, B. et al. (1992) Current 
Opin. Struct Biol., 2:369). Expressing a protein of interest as a fusion protein facilitates purification when the 
fusion contains an affinity domain such as ghrtathione-S-transferase, Protein A or a poly-histidine tail. The 
fusion domain can also facilitate high level expression and/or secretion. 

To liberate the protein product from the fusion domain requires selective and efficient cleavage of the 

2 0 fusion protein. Both chemical and enzymatic methods have been proposed (see references above). Enzymatic 

methods are generally preferred as they tend to be more specific and can be performed under mild conditions 
that avoid denaturation or unwanted chemical side-reactions. A number of natural and even designed enzymes 
have been applied for she-specific proteolysis. Although some are generally more useful than others (Forsberg. 
G., Baastrup, B., Rondahl, H., Holmgren, E., Pohl, G., Hartmanis, M. and Lake, M. (1992) J. Prot. Chem.. 
25 1 1:201-21 1), no one is applicable to every situation given the sequence requirements of the fusion protein 
junction and the possible existence of protease sequences within the desired protein product Thus, an expanded 
array of sequence specific proteases, analogous to restriction endonucleases, would make she-specific proteolysis 
a more widely used method for processing fusion proteins or generating protem/peptide fragments either in vitro 
or in vivo. 

30 The processing of prohormones by the KEX2-rekated family of serine endoproteases illustrates one of 

the most precise proteolytic events found in nature (for reviews see Sterner, D. F„Smeekens, S. P„ Ohagi, S. and 
Chan, S. J. (1992)J. Biol. Chenu 267, 23435-23438 and Smeekens, S. P. (1993) Bio/Technology II. 182-186). 
This family of proteases, that includes the yeast KEX2 and the mammalian PC2, PC3 and furin enzymes, are 
homologous to the bacterial serine protease subtilisin (Kraut, J. (1977) Annu. Rev. Biochem.., 46:33 1-358). 

35 Subtilisin has a broad substrate specificity that reflects its role as a scavenger protease. In contrast, these 
cukaryotic enzymes are very specific for cleaving substrates containing two basic residues and thus well-suited 
for site-specific proteolysis. 
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All of these eucaryotic enzymes strongly requite Arg st the PI position, end either Arg. Lys or Pro st 
the P2 position of peptide substrates. The prohormone converuses from higher eukaryotes such as furin. PC2. 
and PC3 also have an absolute requirement for Arg at me P4 position (Bresnahan, P. A. Leduc, R, Thomas. L.. 
Unmer.J.. Gibson, H.L., Brake. A.J,B«.PJ.«ulTlwm«.O.(lW0) J.CelLBIol. 111.2851;Wis e .R.J.. 
5 Bear. P. J.. Wong. P. A. Kiefer. M. C. Brake. A. J, and Kaufman, R. J. (1990) Proe. Natl. Acad. Sci. USA 87. 
937g-9382.; Hosaka, M., Nagahama, M, Kim. W.-S, Watanabe. T, Hatsuzakawa. K, Ikemizu. J.. Murakami. 
IC, andNakayama. K.. (1991)J. Biol Oiem. 266. 12127-l2130.;Matthews, D. ).. Goodman. L. J- Gorman. C. 
M, and Wells. J. A. (1994) Protein Science 3. 1 197-1205). 

Despite the very narrow specificity of the pro-hormone processing enzymes, in some cases they are 
10 capable of rapid cleavage of target sequences. For example, the kJKm ratio for KEX2 to cleave a good 
substrate (e.g. acetyl-pMYRK-MCA) is l.lxio' l*V (Brenner. C, and Fuller. R.S. (1992) Proc. Natl. Acad. 
Sci. USA . g9«22-926) compared to 3xl0 5 for subtilisin cleaving a good substrate (eg. suc-AAPF-pNA) (EstelL 
D. A.. Graycar. T. P.. Miller. J. V, Powers. D. B, Bumier. J. P, Ng. P. G. and Wells. JA. (I9g6) Science. 
233:659*63). 

15 However, the eukaryotic proteases are expressed in small amounts (Bravo. D. B, Gleason. J. B.. 

Sanchez, R. I, Roth, R. A., and Fuller. R. S. (1994) J. Biol. Chem.. 269 :25830-25837 and Matthews, D. J.. 
Goodman. L. J.. Gorman, C. M, and Wells. J. A. (1994) Prouin Seta** , 3:1 197-1205) making them 
impractical to apply presently to processing of fusion proteins in vitro. Subtilisin BPK however, can be 
expressed in large amounts (Wells. JA.. Ferrari, E.. Henner. DJ.. Estell. DA. and Chen. E.Y. (1983) Nuel. 
50 Acids Res.. 11:7911-7929) 

Extensive protein engineering studies of subtilisin. and especially subtilisin BPN*. have identified 
several residues in the SI and S2 active site of the enzyme where amino acid substitutions lead to large changes 
in substrate specificity (Wells. J. A. and Estell, DA, (198g) Trends Biochem. Sci.. 13391-297; Carter. P.. et 
at. (1989) PROTElN&Stnicture. Function, mid Genetics, 6:240-248). X-ray crystal structures of subtilisin 
25 containing bound trensitioo state analogues (Wright. C. S„ Alden, R A and Kraut, J. (1969) Nmurt . 221:235- 
242; McPhalen. CA. and James. N.G. (19S8) Biochemistry. 27:6582-6598; Bode. W.. Papamokos. E.. Musil. 
D, SeemueUer.U. and Fritz. M. (1986) EMBO J, 5:813418: and Boa. R, Ultsch. M., Kossiakoff, A.. Graycar. 
T Katz. B. and Power. S. (1988) J. Biol. Chem, 263:7895-7906) can be used to locate active site residues that 
•re in close proximity to side chains at key positions in substrete peptides (Wells. JA, (19g7) Proc. Natl. Acad. 
30 Sci. USA 84:1219-1223). Consideration of electrostatic interactions between charged peptide substrates and 
subtilisin can be used to tailor the substrate binding cleft of the subtilisin BPK to favor complement^ charged 
substretes (Wells. J A, et a), (1987) Proc. Natl. Acad. Sci, USA. 84:1219-1223). Previous work has shown that 
repbcementofresidues at position 156 and 166 in the SI binding site of subtilisin BPN' with various charged 
residues leads to improved specificity for complementary charged substrates. 
35 A substantial amount of protein engineering has been applied to the specificity determinants of the S4 

subsite of subtilisin BPN* in efforts to alter specificity for P4 substrates (Eder. J, Rheinnecker. M, and Fersht. 
A R (1993) FEBS Lett 335. 349-352; Rheinnecker. M, Baker, G, Eder, J, and Fersht, A. R. (1993) 
Biochemist* 32, 1 199-1203 ; Rheinnecker. M, Eder. J.J^dey. P.S, and Fersht. A. R ( 1994) Biochemistry 33. 
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221-225). However, the mutations introduced consisted entirely of hydrophobic substitutions, thus preserving 
the overall hydrophobic substrate preference in the she. 

Previous attempts to introduce, remove or reverse charge specificity in enzyme active sites have been 
met with considerable difficulty. This has generally been attributed to a lack of stabilization of the introduced 
S charge or enzyme-substrate ion pair complex by the wild-type enzyme environment (Hwang, J.K. and Warshel, 
A. (1988) Nature , 334:270-272). For example, Stennicke et d (Stetmicke, H.R.; Ujje. H.M.; Christcnsen, U.; 
Remington. S J.: and Breddam (1994) Prot. Eng. 7:911-916) made acidic (D/E) mutations at five residues in 
the Pr binding of carboxypeptidase Y in an attempt to change the Pf preference fromPheto Lys/Arg. Only the 
L272D and L272E mutations were found to alter the specificity in the desired direction, up to 1 .Mold preference 

10 in Lys/Arg over Phc, and the others simply resulted in less active enzymes having substrate preferences similar 
to wild-type. In the case of trypsin, a protease that is highly specific for basic PI residues, recruitment of 
diymotrypsin-like (hydrophobic PI) specificity required not only mutations of the ion pair-forming Asp 189 to 
Ser, but also transplantation of two more distant surface loops from chymotrypstn (Graf, U Jancso, A., Szilagyi, 
L„ Hegyi, G, Pinter, K„ Naray-Szabo, G., Hepp. J., Medzuvadszky, K.. and Rutter, W. J. Proc. Natl. Acad, 

15 Sci. USA (1988) 85:4961-4965 and Hedstrom, L, Szilagyi, U and Rutter, W. J.. Science (1992) 255:1249- 
1253). 

In the present work, we have also verified that relatively low specificity is gained by introducing single 
ion-pairs between enzyme and substrate. However, when two or more choice ionic interactions were 
simultaneously engineered into subtilisin BPN\ the resulting variants had higher specificity for basic residues 
20 in each of the subsites due to a non addhive effect 

Accordingly, it is an object to produce a subtilisin variant with basic specificity for use in processing 
pro-proteins made by recombinant techniques. 

SUMMARY OF THE INVENTION 

The present invention provides for subtilisin variants with altered substrate specificity. Preferred 
2 5 subtilisin variants are highly specific for the efficient cleavage of substrates containing basic residues. The 
subtilisin variants have a substrate specificity which is substantially different from the substrate specificity of the 
precursor subtilisin from which the amino acid sequence of the variant is derived. The amino acid sequence of 
the subtilisin variants are derived by the substitution of one or more amino acids of a precursor subtilisin amino 
acid sequence. 

30 In a preferred aspect of the present invention, the subtilisin variants of the present invention are specific 

for the cleavage of protein substrates containing basic amino acid residues at substrate positions PI , P2 and P4. 
According to this aspect of the present invention subtilisin variants having amino acid substitutions at positions 
corresponding to amino acid positions 62, 104 and 166 of subtilisin BPN' produced by Bacillus 
amyioliquefacuns are preferred. Accordingly, subtilisin variants are provided wherein amino acids 62. 104 and 

35 166 of subtilisin BPN* are substituted with an acidic amino acids. Preferably the acidic amino acid is Asp or Glu, 
and most preferably Asp. 
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Preferred substrates for the subtilisin variants according to this aspect of the present invention contain 
either Lys (K) or Arg (R) at substrate positions P2 and PI, practically any residue at P3, and preferably either 
Lys or Arg at P4, and again practically any residue at P5. Thus an exemplary good substrate would contain -Asn- 
Arg.Met-Arg.Lys- (SEQ ID NO: 76) at -P5-P4-P3-P2-P1- respectively. Additionally, good substrates would 
5 not have Pro at P 1', P2\ or P3' nor would He be present at PI'. 

According to a second aspect of the present invention the subtilisin variants are capable of cleaving 
protein substrates having basic residues at positions PI and P2. According to this aspect of the present invention 
subtilisin variants having amino acid substitutions at positions corresponding to amino acid positions 62. and 166 
of subtilisin BPN" produced by Bacillus amyloliquefaciens are preferred. The preferred subtilisin variants having 
X 0 substrate specificity for dibasic substrates have an acidic amino acid residue at residue position 62 of subtilisin 
naturally produced by Bacillus amyloliquefociens. In a preferred embodiment, the naturally occurring Asn at 
residue position 62 of subtilisin BPN* is preferably substituted with an acidic amino acid residue such as Glu or 
Asp, and most preferably Asp. The preferred subtilisin variants, having substrate specificity for substrates having 
dibasic amino acid residues, additionally have an acidic residue, Asp or Glu, at residue position 166 of subtilisin 
15 BPN*. Thus, the subtilisin BPN 1 variant containing substitution of amino acids 62 and 166 with acidic amino 
acids Glu or Asp are preferred. In particular, a subtilisin variant having amino acid Asp at positions 62 and 166 
is preferred (subtilisin BW variant N62D/G166D). The subtilisin variants according to this aspect of the 
invention may be used to cleave substrates containing dibasic residues such as fusion proteins with dibasic 
substrate linkers and processing hormones or other proteins (in vitro or in vivo) that contain dibasic cleavage 
20 sites. 

Preferred substrates for the subtilisin BPN* variant N62D/G166D contain either Lys (K) or Arg (R) at 
substrate positions P2 and PI, practically any residue at P3, a non-charged hydrophobic residue at P4, and again 
practically any residue at P5. Thus an exemplary good substrate would contain -Asn-Leu-Met-Arg-Lys-<SEQ 
ID NO: 35) at -P5-P4-P3-P2-P1- respectively. Additionally, good substrates would not have Pro at Pl\ P2\ or 
25 P3' nor would lie be present at PI*. 

The invention also includes mutant DNA sequences encoding such subtilisin variants. These mutant 
DNA sequences are derived from a precursor DNA sequence which encodes a naturally occurring or recombinant 
precursor subtilisin. The mutant DNA sequence is derived by modifying the precursor DNA sequence to encode 
the substitution(s) of one or more ammo acids encoded by the precursor DNA sequence. These recombinant 
3 o DNA sequences encode mutants having an amino acid sequence which does not exist in nature and a substrate 
specificity which is substantially different from the substrate specificity of the precursor subtilisin encoded by 
the precursor DNA sequence. 

Further the invention includes expression vectors containing such mutant DNA sequences as well as host 
cells transformed with such vectors which are capable of expressing the subtilisin variants. 
35 The invention also provides for a process for cleaving a polypeptide such as a fusion protein containing 

a substrate linker represented by the formula: 
P4-P3-P2-P1 

wherein P4 is a basic ammo acid or a large hydrophobic amino acid such as Leu or Met; P3 is an amino acid 
selected from the naturally occurring amino acids; P2 is a basic amino acid; and PI is a basic amino acid. The 
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process includes the step of subjecting the polypeptide to the subtil is in variants described herein under conditions 
such that the subtilisin variant cleaves the polypeptide. 



BRIEF DESCRIPTION OF THE FIGURES 
Figure 1. Structure of a succtoyl-Ata-AU-Pro-BoroPhe (SEQ ID NO: 69) inhibitor bound to the active 
5 she of subtilisin BPN' showing the S2 and S 1 binding pocket residues subjected to mutagenesis. 

Figure 2. Kinetic analysis of SI binding site subtilisin mutants versus substrates having variable PI 
residues. The kinetic constant k^/Km was determined from plots of initial rates versus substrate concentration 
for the tetrapeptide series succinyl-Ala-Ala-Pro-Xaa-pNa (SEQ ID NO; 69), were Xaa was Lys (SEQ ID NO: 
58), Arg (SEQ ID NO: 59), Phe (SEQ ID NO: 56). Met (SEQ ID NO: 60) or Gin (SEQ ID NO: 61 ) (defined to 
10 the right of the plot). 

Figure 3. Kinetic analysis of S2 binding site subtilisin mutants versus substrates having variable P2 
residues. The kinetic constant k t JKm was determined from plots of initial rates versus substrate concentration 
for the tetrapeptide scries succinyl-Ala- Ata-Xaa-Phe-pNa (SEQ ID NO: 70). were Xaa was Lys(SEQ ID NO: 
62). Arg (SEQ ID NO: 64), Ala (SEQ ID NO: 63), Pro (SEQ ID NO: 56X or Asp (SEQ ID NO: 65) (defined on 
15 the right of the plot). 

Ffcure 4. Kinetic analysis of combined SI and S2 binding she subtilisin mutants versus substrates having 
variable PI and P2 residues. The kinetic constants k^/Km were determined from plots of initial rates versus 
substrate concentration for the tetrapeptide series succinyl-Ala-Ala-Xaa r Xaa r pNa (SEQ ID NO: 71). were 
Xaa r Xaa, was Lys-Lys (SEQ ID NO: 66), Lys-Arg (SEQ ID NO: 67), Lys-Phe(SEQ ID NO: 62), Pro-Lys (SEQ 
20 ID NO: 58), Pro-Phe (SEQ ID NO: 56), or Ala-Phe (SEQ ID NO: 63) (defined on the right of the plot). 

Figure 5. Results of hGH-AP fusion protein assay. hGH-AP fusion proteins were constructed, bound to 
hGHbp-coupled resin, and treated with 03 nM N62D/G166D subtilisin fa 20 mM Tris-CI pH 8.2. Aliquots were 
withdrawn at various times and AP release was monitored by activity assay in comparison to a standard curve. 
Arrows indicate the cleavage site. Tne rate of cleavage of fusion proteins containing various substate linkers is 
25 shown. Substrates containing a Pro at position PI* are not cleaved. 

Figure 6-1 - 6-10. (Collectively referred to herein as Tig. 6). DNA sequence of the phagemid pSS5 
containing the N62D/G166D double mutant subtilisin Bm gene (SEQ ID NO: I), and translated amino acid 
sequence for the mutant preprosubtihstn (SEQ ID NO: 2). Tne pre region is comprised of residues -107 to -78. 
the pro of residues -77 to -1, and the mature enzyme of residues +1 to +275 (SEQ ID NO: 72). Also shown are 
30 restriction sites recognized by endonucleases that require 6 or more specific bases in succession. 

Figure 7. Structure of a succinyl-Ala-Ala-Pro-BoroPhe (SEQ ID NO: 69) inhibitor bound to the active 
site of subtilisin BPN* showing the SI, S2, and S4 binding pocket residues subjected to mutagenesis. 

Figure 8. DNA sequence of the N62DW04D/GI66D triple mutant (SEQ ID NO:74) as well as the 
translated amino acid sequence (SEQ ID NO:75). The preregion is comprised of residues - 1 07 to -78, the pro 
35 residues -77 to -1 and the mature enzyme +1 to +275. Tne proregiori reflects the changes, A(-4)RM(-2)JOY(- 1 )R 
made in the wild-type processing site to affect expression. 
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DETAILED DESCRIPTION OF THE INVENTION 

Definitions 

Terms used in the claims and specification are defined as set forth below unless otherwise specified. 
The term amino acid or amino acid residue, as used herein, refers to naturally occurring L amino acids or 
5 residues, unless otherwise specifically indicated. The commonly used one- and three-letter abbreviations for 
amino acids are use herein (Uhninger, A. U Biochemistry, 2d ed M pp. 71-92, Worth Publishers, N. Y. (1975)). 
Basic amino acids are Arg anp* Lys. Acidic amino acids are Asp and Glu. 

Substrates are described in triplet or single letter code as Pn..P2-PI-Pr-P2'...Pn\ The "PP residue refers 
to the position proceeding (i.e., N-terminal to) the scissile peptide bond {i.e. between the PI and PI* residues) 
10 of the substrate as defined by Schechter and Berger (Schechter, 1. and Berger, A., Biochem. Biophys. Res. 
Commun.27: 157-162(1967)). Similarly, the term PI' is used to refer to the position following (Le., C-terminal 
to) the scissile peptide bond of the substrate. Increasing numbers refer to the next consecutive position preceding 
(e.g., P2 and P3) and following (e.g., P2* and P3*) the scissile bond According to the present invention the 
scissile peptide bond is that bond that is cleaved by the subtilisin variants of the instant invention. 
15 "Subtilisins," "precursor subtilisin" and the like are bacterial carbonyl hydrolases which generally act to 

cleave peptide bonds of proteins or peptides. As used herein, "subtilisin** means a naturally occurring subtilisin 
or a recombinant subtilisin. A series of naturally occurring subtilisins are known to be produced and often 
secreted by various bacterial species (Siezen, RJ., et at, (1991) Protein Engineering 4:719-737). Amino acid 
sequences of the members of this series are not entirely homologous. However, the subtilisins in this series 
2 0 exhibit the same or similar type of proteolytic activity. This class of serine proteases shares a common amino 
acid sequence defining a catalytic triad which distinguishes them from the chymotrypsin related class of serine 
proteases. The subtilisins and chymotrypsin related serine proteases both have a catalytic triad comprising 
aspartate, histidinc and serine. In the subtilisin related proteases the relative order of these amino acids, reading 
from the amino to carboxy terminus is aspartate-histidiiie-serine. In the chymotrypstn related proteases the 
25 relative order, however is bistkime-aspaitate-serine. Thus, subtilisins as used herein refer to a serine protease 
having the catalytic triad of subtilisin related proteases. 

Generally, subtilisins are serine endoproteases' having molecular weights of about 27,500 which are 
secreted in targe amounts from a wide variety of Bacillus species. The protein sequence of subtilisins have been 
determined from at least four different species of Bacillus (Markland, F.S.. et aL (1971) in The Enzymes, ed. 
30 Boyer P.O., Acad Press, New York, Vol. Ill, pp. 361-608; and Nedkov, P. et al. (1983) Hoppe-Seyler*s Z. 
Physiol. Chem. 364: 1537-1540). The three-dimensional crystallographic structure of four subtilisins have been 
reported (BPW from Bacillus amyloliquefaciens, Hirono et al. (1984) J. Mol. Biol. 178:389-413; subtilisn 
Carlesberg from Bacillus licheniformis. Bode et al., (1986) EMBO J., 5:813-818; thermitase from 
Thermoactinomyces vulgaris. Gros et al., (1989) J. Mol. Biol. 2 10:347-367; and proteinase K from TrUirachium 
35 album, Betzel, et al., (1988) Acta Crystallogr., B. 44:163-172). The three dimensional structure of subtilisin 
BW (from B. amylcAiquefaciens) to 2.5A resolution has also been reported by Wright, C.S. et al (1969) Nature 
221:235-242 and Drenth, ). etal. (1972) Eur. J. Biochem. 26:177-181. These studies indicate that although 
subtilisin is genetically unrelated to the mammalian serine proteases, it has a similar fold and active site structure. 
The x-ray crystal structures of subtilisin containing covalently bound peptide inhibitors (Robertus, J.D.. et al. 

-6- 
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(1972) Biocbemistiy 11:2439-2449), product complexes (Robcitus. J.D., et al. (1972) Biochemi$tiy 1 1:4293- 
4303). and transition state analogs (Matthews, D.A., et at. (1975) J. BioL Chem. 230:7120-7126 and Poulos, 
TL.,etat. (1976)7. BioL Chem. 23 1:1097-1 103). which have been reported have also provided infonnation 
regarding the active site and putative substrate binding cleft of subtilisins. In addition, a large number of kinetic 
5 and chemical modification studies have been reported for subtilisins (Phillip. M., et at. (1983) Moi Cell. 
Biochenu 51:5-32: Svendsen, LB. (1976) Carbberg Res. Comm. 41:237-291 and Markland, F.S. id.) as well 
as at least one report wherein the side chain of methione at residue 222 of subtilisin was convened by hydrogen 
peroxide to methionine-sulfoxide (Stauffer. D.C.. et ai (1965) J. BioL Chem. 244 5333-5338). 

•Subtilisin variant." "subtilisin mutant" and the IDce refer to a subtilisin-typc serine protease having a 

10 sequence which is not found in nature mat is derived from a precursor subtilisin according to the present 
invention. The subtilisin variant has a substrate specificity different from the precursor subtilism by virtue of 
amino acid substitutions within the precursor subtilisin amino acid sequence. The term is meant to include 
subtilisin variants in which the DNA sequence encoding the precursor subtilisin is modified to produce a mutant 
DNA sequence which encodes the substitution of one or more amino acids in the naturally occurring subtilisin 

15 amino acid sequence. Suitable methods to produce such modification include those disclosed in U. S. Patent No. 
4.760,025 and 5,371,008 and in EPO Publication No. 0130756 and 025 1446. 

A change in substrate specificity is defined as a difference between the K^JKm ratio of the precursor 
subtilism and the subtilisin variant. The K^/Km ratio is a measure of catalytic efficiency. Subtilisin variants 
with increased or decreased K^/Km ratios compared to the precursor subtilisin from which they were derived 

20 are described herein. Generally, the objective is to secure a variant having a greater, i.e. numerically larger, 
KJKm ratio for a given substrate. A greater K^/Km ratio for a particular substrate indicates that the variant 
may be used to more efficiently cleave the target substrate. 

The specificity or discrimination between two or more competing substrates is determined by the ratios 
of k^/Km (Fenht, A.IL, (1985) in fr^me Structure «id Mechanism. W.F. Freeman and Co., N.Y. p. 1 12). An 

25 increase in K t- /Km ratio for one substrate may be accompanied by a reduction in /Km ratio for another 
substrate. This shift in substrate specificity indicates that the variant subtilism with the increased K^/Km ratio 
for the substrate has utility in cleaving the particular substrate over the precursor subtilism in, for example, 
preventing undesirable hydrolysis of a particular substrate in a mixture of substrates. 

In general, for a subtilisin variant to have a useful catalytic efficiency for cleavage of a particular substrate 

30 the Ko/Kmratio will generally be between 1 x I^M^V^ about 1 x 10 7 M^V 1 . More often, the K^/Km ratio 
wilt be between about 1 x 10 4 NT's 1 and 1 x 10* MV. 

When referring to mutants or variants, the wild type amino acid residue is followed by the residue number 
and the new or substituted amino acid residue. For example, substitution of D for wild type N in residue position 
62 is denominated N62D. 

3 5 "Subtilisin variants or mutants" are designated in the same manner by using the single letter amino acid 

code for the wild-type residue followed by its position and the single letter amino acid code of the replacement 
residue. Multiple mutants are indicated by component single mutants separated by slashes. Thus the subtilisin 
BPN' variant N62D/G166D is a di-substituted variant in which Asp replaces Asn and Gly at residue positions 
62 and 166. respectively, in wild-type subtilisin BPN 1 . 
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An amino acid residue of a precursor carbonyl hydrolase is "equivalent" to a residue of B. 
amyloliquefaciens subtilisin if it is either homologous (i.e., corresponding in position in either primary or tertiary 
structure) or analogous to a specific residue or portion of that residue in B. amyloliquefaciens subtilisin (i.e.. 
having the same or similar functional capacity to combine, react, or interact chemically). 
S In order to establish homology to primary structure, the amino acid sequence of a precursor carbonyl 

hydrolase is directly compared to the B. amyloliquefaciens subtilisin primary sequence and particularly to a set 
of residues known to be invariant in all subtilisins for which the sequences are known (sec e.g. Figure 5-C in EPO 
025 1446). After aligning the conserved residues, allowing for necessary insertions and deletions in order to 
maintain alignment (Le., avoiding the elimination of conserved residues through arbitrary deletion and insertion), 

10 the residues equivalent to particular ammo acids in the primary sequence of B. amyloliquefaciens subtilisin are 
defined. Alignment of conserved residues should conserve 100% of such residues. However, alignment of 
greater than 75% or as little as 50% of conserved residues is also adequate to define equivalent residues. 
Conservation of the catalytic triad, Asp32/His64/Ser221, is required. 

Equivalent residues homologous at the level of tertiary structure for a precursor carbonyl hydrolase whose 

15 tertiary structure has been determined by x-ray crystallography, are defined as those for which the atomic 
coordinates of 2 or more of the main chain atoms of a particular amino acid residue of the precursor carbonyl 
hydrolase and £. amyloliquefaciens subtilisin (N on N, CA on CA, C on C, and O on O) are within 0.1 3nm and 
preferably 0.1 nm after alignment Alignment is achieved after the best model has been oriented and positioned 
to give the maximum overlap of atomic coordinates of non-hydrogen protein atoms of the carbonyl hydrolase 

20 in question to the B. amyloliquefaciens subtUisin. The best model is the crystallography model giving the lowest 
R factor for experimental diffraction data at the highest resolution available. 

Z\Fo(h)\-\Fc(h)\ 

h 

25 A factor- — 

l\Fo(h)\ 
h 

Equivalent amino acid residues of subtilisin BFN\ subtilisin Cars! berg, therm itase and proteinase K from tertiary 
structure analysis is provided in, for example, Skzen, et a!., (1991) Prot. Eng. 4:719-737. 

30 Equivalent residues which are functionally analogous to a specific residue of B. amyloliquefaciens 

subtilisin are defined as those amino acids of the precursor carbonyl hydrolases which may adopt a conformation 
such that they either alter, modify or contribute to protein structure, substrate binding or catalysis in a manner 
defined and attributed to a specific residue of the B. amyloliquefaciens subtilisin as described herein. Further, 
they are those residues of the precursor carbonyl hydrolase (for which a tertiary structure has been obtained by 

3 5 x-ray crystallography), which occupy an analogous position to the extent that although the main chain atoms of 
the given residue may not satisfy the criteria of equivalence on the basts of occupying a homologous position, 
the atomic coordinates of at least two of the side chain atoms of the residue lie within O.I3nm of the 
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corresponding side chain atoms of B. amyloliquefaciens subtilisin. The three dimensional structures would be 
aligned as outlined above. 

Some of the residues identified for substitution are conserved residues whereas others are not in the case 
of residues which are not conserved, the replacement of one or more amino acids ts limited to substitutions which 
5 produce a mutant which has an amino acid sequence that does not correspond to one found in nature. In the case 
of conserved residues, such replacements should not result in a naturally occurring sequence. The subtilisin 
mutants of the present invention include the mature forms of subtilisin mutants as well as the pro* and prepro- 
forms of such subtilisin mutants. The prepro-forms are the preferred construction since this facilitates the 
expression, secretion arid maturation of the subtilisin mutants. 

1 o "Prosequence" refers to a sequence of amino acids bound to the N- terminal portion of the mature form of 

a subtilisin which when removed results in the appearance of the "mature" form of the subtilisin. Many 
proteolytic enzymes are found in nature as translation*! proenzyme products and in the absence of post- 
translations) processing, are expressed in this fashion. The preferred prosequence for producing subtilisin 
mutants, specifically subtilisin BPN' mutants, is the putative prosequence of B. amyloliquefacim subtilisin 

IS although other subtilisin prosequences may be used For example, when the substrate spedficity of the precursor 
subtilisin is altered according to the present invention, this alteration may affect the ability of the variant 
subtilisin to undergo autolytic cleavage of the naturally occurring prosequence. In order to affect the expression 
and proper folding of a mature variant subtilisin whose substrate specificity has been altered, it may be necessary 
to alter the prosequence to correspond to the new or variant substrate specificity. 

20 As an example, the substrate specificity of a particular subtilisin variant N62D/Y 1 04D/G 1 66D is distinct 

from the precursor subtilisin from which ft was derived The subtilisin variant prefers substrates containing basic 
residues at substrate poshkxu According to this aspect of the presem mvemion, 

the precursor prosequence which was efficiently autoiysed by the precursor subtilisin Is altered to correspond 
to the substrate specificity of the variant subtilisin. Therefore, for the subtilisin variant N62D/Y 104/G 1 66D the 

2 5 prosequence would be altered to contain basic residues at positions -4, -2, and * 1 . 

A "signal sequence" or "presequence" refers to any sequence of amino acids bound to the N-terminal 
portion of a subtilisin or to the N-terminal portion of a prosubtilisin which may participate in the secretion of the 
mature or pro forms of the subtilisin. This definition of signal sequence is a functional one, meant to include all 
those amino acid sequences, encoded by the N-terminal portion of the subtilism gene or other secretable carbonyl 
30 hydrolases, which participate in the effectuation of the secretion of subtilisin or other carbonyl hydrolases under 
native conditions. The present invention utilizes such sequences to effect the secretion of the subtilisin mutants 
as defined herein. 

A "prepro" form of a subtilisin mutant consists of the mature form of the subtilisin having a prosequence 
operably linked to the ammo-terminus of the subtilisin and a "pre" or "signal" sequence operably linked to the 

3 5 amino terminus of the prosequence. 

"Expression vector" refers to a DNA construct containing a DMA sequence which is operably linked to 
a suitable control sequence capable of effecting the expression of the DNA in a suitable host. Such control 
sequences include a promoter to effect transcription, an optional operator sequence to control such transcription, 
a sequence encoding suitable mRNA ribosome binding sites, and sequences which control termination of 
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transcription and translation. The vector may be a plasmid, a phage particle, or simply a potential genomic insert. 
Once transformed into a suitable host the vector may replicate and function independently of the host genome, 
or may, in some instances, integrate into the genome itself. In the present specification, "plasmid" and "vector" 
are sometimes used interchangeably as the plasmid is the most commonly used form of vector at present. 
5 However, the invention is intended to include such other forms of expression vectors which serve equivalent 
functions and which are, or become, known in the art 

The "host cells" usedLin the present invention generally are procaryotic or eucaryotic hosts which 
preferably have been manipulated by the methods disclosed in EPO Publication No. 0130756 or 0251446 or U.S. 
Patent No. 5,37 1 ,008 to render them incapable of secreting enzymatically active endoprotease. A preferred host 
10 cell for expressing subtilisin is the Bacillus strain BG2036 which is deficient in enzymaticalry active neutral 
protease and alkaline protease (subtilisin). The construction of strain BG2036 is described in detail in EPO 
Publication No. 0130756 and fiirwerdesc^ 160:15-21. Suchhost 

cells are distinguishable from those disclosed in PCT Publication No. 03949 wherein enzymatjcaliy inactive 
mutants of intracellular proteases in £ coii are disclosed. Other host cells for expressing subtilisin include 
15 Bacillus subttlis var. 1168 (EPO Publication No. 0130756). 

Host cells are transformed or transfected with vectors constructed using recombinant DNA techniques. 
Such transformed host cells are capable of either replicating vectors encoding the subtilisin mutants or expressing 
the desired subtilisin mutant In the case of vectors which encode the pre or prepro form of the subtilisin mutant 
such mutants, when expressed, are typically secreted from the host cell into the host cell medium. 
20 "Opcrabry linked" when describing the relationship between two DNA regions simply means that they are 

functionally related to each other. For example, a ©resequence is operably linked to a peptide if it functions as 
a signal sequence, rjarticipating in the secretion of the mature form of the protein most probably involving 
cleavage of the signal sequence. A promoter is operably linked to a coding sequence if it controls the 
traiiscription of the sequence; a ribosome binding site is operaWy linked to a coding sequence if it is positioned 
25 so as to permit translation. 

The genes encoding the natural ly-occurring precursor subtilisin may be obtained in accord with the general 
methods described in U.S. Patent No. 4,760,025 or EPO Publication No. 0130756. As can be seen from the 
examples disclosed therein, the methods generally comprise synthesizing labeled probes having putative 
sequences encoding regions of the hydrolase of interest preparing genomic libraries from organisms expressing 
30 the hydrolase, and screening the libraries for the gene of interest by hybridization to the probes. Positively 
hybridizing clones are then mapped and sequenced. 

The cloned subtilisin is then used to transform a host cell in order to express the subtilisin. The subtilisin 
gene is then ligated into a high copy number plasmid. This plasmid replicates in hosts in the sense that it contains 
the well-known elements necessary for plasmid repHcation: a promoter operably linked to the gene in question 
3 5 (which may be supplied as the gene's own homologous promoter if it is recognized, ie. , transcribed, by the host), 
a transcription termination and polyadenylatkm region (necessary for stability of the mRN A transcribed by the 
host from the hydrolase gene in certain eucaryotic host cells) which is exogenous or is supplied by the 
endogenous terminator region of the subtilisin gene and, desirably, a selection gene such as an antibiotic 
resistance gene that enables continuous cultural maintenance of plasmid-infected host cells by growth in 
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antibiotic-containing media. High copy number plasmids also contain an origin of replication for the host 
thereby enabling large numbers of plasmids to be generated in the cytoplasm without chromosomal limitations. 
However, it is within the scope herein to integrate multiple copies of the subtilisin gene into host genome. This 
is facilitated by procaryotic and eucaryotic organisms which are particularly susceptible to homologous 
5 recombination. 

Once the subtilisin gene has been cloned, a number of modifications are undertaken to enhance the use 
of the gene beyond synthesis, of the nanirally-occurring precursor subtilisin. Such modifications include the 
production of recombinant subtilisin as disclosed in U.S. Patent No. 5.371,008 or EPO Publication No. 0130756 
and the production of subtilisin mutants described herein. 

10 Mutant design and preparation. 

A. Subtilisin Variants Capable of Cleaving Substrates Having Dibasic Residues. 

For the preparation of subtilisin variants capable of cleaving substrates containing dibasic residues, the 
following analysis was undertaken. 

A number of structures have been solved of subtilisin with a variety of inhibitors and transition state 

IS analogs bound (Wright, C. S., AMen, R. A. and Kraut, J. (1969) Nature, 221:235-242; McPhakn. C.A. and 
James, KG. (1988) Biochemistry, 27:6582-6598; Bode, W„ Papamokos, E, Musil, D.. SeemueUer, U. and Fritz, 
M. (1986) EMBOJ.. 5:813-818; and Boo, Uhsch, M. t Kossiakoft A. Graycar, T., Katz, B. and Power, S. 
(1988)./ Biol. Chem.. 263:7895-7906). One of these structures. Figure 1, wis used to locate residues mat are 
in close proximity to side chains at the PI and P2 positions from the substrate. Previous work had shown that 

20 replacement residues at positions 156 and 166 in the SI binding site with various charged residues lead to 
improved specificity for complementary charged substrates (Wells, J. A., Powers, D. B., Bott, R. R., Graycar. 
T. P. and Estell, D. A. (1987) Proc. Natl. Acad Sci. USA, 84:1219-1223). Although longer range electrostatic 
effects of substrate specificity have been noted (Russell, A. J. and Fersht, A. R. (1987) Nature . 328:496-500) 
these were generally much smaller than local ones. Therefore, it seemed reasonable that local differences in 

2 5 charge between subtilisin BPN* and the eukaryotic enzymes may account for the differences in specificity. 

A detailed sequence alignment of 35 different subtilisin-like enzymes (Siexen, R. J., de Vos, W. M., 
Leunissen, A. M., and Dykstra, B. W. (1991) ProL Eng., 4:719-737) allowed us to identify differences between 
subtilisin BPN' and the eukaryotic processing enzymes, KEX2, fiirin and PC2. Within the SI binding pocket 
there are a number of charged residues that appear in the pro-hormone processing enzymes and not in subtilisin 

30 BPN' (Table I A). 
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TABLE 1A 
SI subsist 



1 numbering according to subtilisin BPK sequence 





125-131* 


151-157 


163-168 


Subtilisin BFN* 


SLGGPSG 
(SEpIDNO: 3) 


AAAGNEG 
(SEOIDNO: 4) 


ST-VGYP 
(SEpiDNO: 5) 


Kex2 


SWGPADD 
(SEq ID NO: 6) 


FASGNGG 
(jSEpIDNO:7) 


CNYDGYT 
(SE9IDNO: 8) 


Furin 


SWGPEDD 

fSE01DNO:9) 


WASGNGG 
fSEqiDNO: 10) 


CNCDGYT 
(SEOIDNO:!!) 


PC2 


SWGPADD 

{SFQ IDNO:6> 


WASGDGG 
fSfQinNO:l2> 


CNCDGYA 
fSEO ID NO: 13) 



xo 



For example, the eukaryotic enzymes have two conserved Asp residues at 130 and 131 as well as an Asp at 1 65 
that is preceded by insertion of a Tyr or Cys. However, in the region from 151-157, subtilisin BPN* contains a 
Glu and the eukaryotes a conserved Gly. 

In the S2 binding site there were two notable differences in sequence (Table IB). 



15 



20 



TABLE IB 

S2s 





30-35 


60-64 


Subtilisin BPN 1 


V1DSGI 
(SEOIDNO: 14) 


DNNS H 
(SEpiDNO: 15) 


KEX2 


IVDDGL 
(SEpIDNO:16) 


SDDYH 

(SEp ID NO: 17) 


Furin 


ILDDGl 
(SEpIDNO:18) 


NDNRH 
(SEpiDNO: 19) 


PC2 


] MDDG I 

fSEO1DNO:30) 


WFNSH 
(SF.plDNO:2n 



Subtilisin BPB' contains a Ser at position 33 whereas the pro-hormone processing enzymes contain Asp. There 
is not as clear a consensus in the region of 60-64, but one notable difference is at position 62. This side chain 
which points directly at the P2 side chain (Figure 1) is Asn in subtilisin BPK. furin and PC2 but Asp in KEX2. 
Thus, not all substitutions were clearly predictive of the specificity differences. 

A variety of mounts were produced to probe and engineer me specificity of subtilisin BPK using 
oligonucleotides described in Table 2. 
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TABLE 2 

Oligonucleotides used for site-directed 
mutagenesis on subtliisbu 



Mutant 


Oligonucleotide 


Specificity 
Pocket 


Activity 
Expressed 


S33D 


5'- GCGGTTATCGACG* A *CGGTATCGATTCT -3' 
(SEQ ID NO: 22) 


S2 


+ 


S33K 


5'- GCGGTT ATCG AC AA*A*G'GTATCG ATTCT -3* 
(SEp ID NO: 23) 


S2 




S33E 


3'- GCGGTTATCGACG • A* A*GGTATCGATTCT -3* 
(SEQ TP NO: 24) 


S2 




N62D 


5'- CCAAGACAACG* ACTCTCACGGAA -3' 
(SE01DNO:25) j 


S2 


+ 


N62S 


5*- CCAAGACAAC AG • CTCTCACGG AA -3' 
(SEQ ID NO: 26) 


S2 




N62K 


3 1 - CCAAGACAACAAA*TCTCACGGAA -3' 
(SEOIDNO: 272 


S2 


+ 


G166D 


5 , -CACTTCCGGCAGCTCG , T # C*G*ACAGTGGA*C*T 

ACCCTGGC.AAATA-3' 

(SEQ ID NO: 28) (Inserts Sal 1 site^ 


SI 




G166E 


3'-CACTTCCGGCAGCTCG*T*C*G* ACAGTGGA*GT 

ACCCTGGCAAATA-3* 

(SEO ID NO: 29} (Inserts Sal 1 she) 


St 


+ 


G128P/P129A 


5'- TTAACATGAGCCTCGGCC*C*AG*CTA*G*C*GGT 

TCTGCTGCTTTA -3* 

(SEO ID NO: 30) (Inserts Nhe I site) 


SI 




G128P/PI29A/ 
S130D/G131D 


5'. TTAACATGAGCCTCGGCC'C*C'G*CGG'A*TGA # 

TTCTGCTGCTTTAAA -3 f 

(SEQ ID NO: 31) (Inserts Sac II site) 


SI 




T164N/V165D 


S'-CGGCAGCTCAAGCA^A^C^G^A^GGCTA-PCCT 

GGCAAATACCCTTCTGTCA -3' 

(SEO ID NO: 32) (Inserts BsaBI site) 


SI 
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I JIH i / v loju 


5*.CGGCAGCTCAAGCA* A* C*G* A*T* GGCTAT*CCT 
GGCAAATACCCTTCTGTCA -3' 


SI 




T164N- 

Y(insert> 

V165D 


5*.ACTrCCGGCAGCTCrT»C'G*AA*C*T»A*C*G'A* 
C'GGGTACCCTGGCAAATA-r 


SI 




N62D/G166D 


See individual mutations 


S1/S2 




N62D/G.66E 


See individual mutations 


SI/S2 


4- 



10 



15 



20 



25 



30 



~ ASKnSU uiuivcasv r " - - - 

After producing the mutant pbsmids they were tnmsformed into a protease deficient strain of A subtilis 
(BG2036) that lacks an endogenoua gene for secretion of subtilisin. These were then tested for protease activity 
on skim milk plates. 

The first set of mutants tested were ones where segments of the SI binding site were replaced with 
.equences from KEXZ None of these segment replacements produced detectable activity on skim milk plates 
even though variants of subtilisin whose catalytic efficiencies are reduced by as much as 1000-fold do produce 
detects l^(WeUs,JA,Cu^gb^B^ Tran > * Soe 

Land. A . 3 17:41 5-423). We went on to produce single residue 

substitutions that should have less impact on the stability. These mutants at positions 166 in the SI site, and 33 

and 62 in the S2 she, were chosen based on the modeling and sequence considerations described above. 

Fortunately .11 single mutants as well as combination mutants produced activity on skim milk plates and could 

be purified to homogeneity. 

Kinetic analysis «f variant sabtUasba. 
To probe the effects of the G166E and GI66D on specificity at the PI position we used substrates having 
the form auc-AAPX-pn. (SEQ ID NO: 69) where X waseidsrLys (SEQ ID NO. 58). Arg (SEQ ID NO. 39). 
Phe (SEQIDNO.56).Met(SEQIDNO.60)orGln(SEQIDNO.61). The kjtot values were determined 
from initial rate meuuroments and results reported in Figure 2. Wheraas the wild-type enzyme prefened 
PhoMoLy^Ar^Gln. the GI66E prefened Ly.-*he>Arg~Met>Cln. and G166D preferred 
Lys>Phe-Arg-Met>Gln. Thus, both the acidic substitutions at position 166 caused a shift in preference for basic 
residues at the PI she, as previously reported (Wells, J. A., Powers. D. B., Botu R. R., Graycar, T. P.and Eaell. 
D. A. (1987a), Proc. Natl. Acad. Set. USA 64:1219-1223). 

Tne effects of single and double substitutions in the S2 binding site were analyzed with substrates having 
the form. suc-Ab-Ala-Xaa-Phe-pna (SEQ ID NO. 70) and are shown in Figure 3. At the P2 position the wild- 
type enzyme prefened Ala>Pro>Lys>Arg>Asp. In contrast, the S33D prefened Ale>Lys-Arg-Pro>Asp and 
theN62Dprefene4Lys>Ala>A*>Pro>Asp. Although the effect, were more dramatic for the N62D mount, 
the S33D variant also showed significant improvement toward basic P2 residues and corresponding reduction 
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in hydrolysis of the Ala and Asp P2 substrates. We then analyzed the double mutant, but found it exhibited the 
catalytic efficiency of the worse of the two single mutants for each of the substrates tested. 

Despite the less than additive effects seen for the two charged substitutions in the S2 she, we decided to 
combine the best S2 site variant (N62D) with either of the acidic substitutions in the SI site. Tne two double 
S mutants, N62D/G 1 66E and N62DK3 1 66D, were analyzed with substrates having the form, sue- AAXX-pna (SEQ 
© NO. 7 1 ) wr^ XX was arm* WC 

ID NO. 58), PF (SEQ ID NO. 56) or AF (SEQ ID NO. 63) (Figure 4). The wild-type preference was 
AF>PF~KF>KK-PK>KR, whereas the double mutants had the preference KK>KR>KF>PK~AF>PF. Thus for 
the double mutants there was a dramatic improvement toward cleavage of dibasic substrates and away from 

1 o cleaving the hydrophobic substrates. 

The greater than additive effect (or synergy) of these mutants can be seen from ratios of the catalytic 
efficiencies for the single and multiple mutants. For example, the G 1 66E variant cannot distinguish Lys from 
Phe it the PI position. Yet the N62D/G 1 66E variant cleaves the Lys-Lys substrate about 8 times faster than the 
Lys-Phe substrate. Similarly the G166D cleaves the Lys PI substrate about 3 times taster than the Phe PI 

15 substrate, but the N62D/G166D double mutant cleaves a Lys-Lys substrate 18 times faster than a Lys-Phe 
substrate. Thus, as opposed to the reduction in specificity seen for the double mutant in the S2 site, the S1-S2 
double mutants enhance specificity for basic residues. It is possible that these two sites bind the dibasic 
substrates m a cooperative manner analogous to a chelate effect. 

Therefore, according to the present mvention, subtilisin mutants having a preference for dibasic residues 

20 are preferred. According to this aspect of the present mvention substitution of amino acids corresponding to 
amino acids N62 and G 166 of subtilisin BW produced from Bacillus amyloliquefaciens are prepared. In 
particular, amino acids 62 and 166, or their equivalents, in the precursor subtilisin are substituted with amino 
acid residues Asp or Glu. Preferred subtilisin variants according to this aspect of the invention include 
N62D/G166D.N62E/G166E, N62E/G 1 66D, and N62D/G 1 66E variants of subtilisin BPN and their equivalents. 

25 B. Subttlbin Variants Capable of Cleaving Substrates Havmg Tribasic lUsidues 

For the preparation of subtilisin variants specific for substrates containing a third basic residue al substrate 
position P4 we used the crystal structure of subtilisin BPN* cornplexed with Ala-AU-Pro-Phe-Boronate(SEQ ID 
NO: 56) (Figure 7) in combination with sequence alignments of subtilisin BPN", KEX2* Furin, PC2, and P 
(Table 3) in designing basic specificity into the SI and S2 and S4 subsitcs. The two subtilisin BPN' residues 

30 that most prominently display their side chains into the S4 pocket are Y104 and 1107 (Figure 7). 

Sequence alignments of subtilisin BPN* and the mammalian prohonnone-processuig proteases (Sierra, 
K K de Vos, W. Leunissen, A. M. f and Dijkstra. B. W. (1991) PrU Eng. 4:7 19-737) (Table 3) reveal that 
position 104 is conserved as Asp, and 107 as Glu in the prohormone converting (Arg-P4 specific) enzymes. 
Therefore these two mutations were introduced either individually or in combination into the dibasic-specific 

3 5 N62D/G 1 66D subtilisin BPN* background (Table 4). 
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Table 3 Sequence alignments for the S4 site of subtiiisins 

S4Site 
100-110 



PCI7US96/02861 



Subtiliein 

KSX2 

Fur in 

PC2 

P 



GSGQYSWIIHG 
GDITTEDEAAS 
GEVTDAVEARS 
PFMTDIIEASS 
G IVTDAI EAS S 



(SEQ ID NO: 77) 

(SEQ ID NO: 78) 

(SEQ ID NO: 79) 

(SEQ ID NO: 80) 

(SEQ ID NO: 81) 



10 Table 4 describes oligonucleotides used for site-directed mutagenesis, protein regions affected by the 

mutations, and relative expression of protein for N62D/G I66D subtilisin BW variants. Bold type indicates 
base changes from the pSS5 (N62D/G166D) template. For "Protein Expressed," indicates a high level 
of expression of mature enzyme in crude culture medium, and "•" indicates no enzyme detectable. 
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A(-4)R/ 
A(-2)K/ 
Y(-1)R 

Y104D/I107E/ 
A(-4)R/ 
A(-2)K/ 
Y(-1)R 



S # - GGTTCCGGCCAA . OATAGCTGGATCATT -3' 
(SEQ ID NO: 82) 

5'- CCAATACAGCTGGGAAATTAACGGAATCG -3 
(SEQ ID NO: 63) 

5 • - GGTTCCGGCCAAQATAGCTGGOAAA1TAACG 
GAATCGA -3» (SEQ ID NO: 84) 

5'- AAOAAGATCACGTAAflACATAAGCGCGCGC 
AGTCCGTGC -3' (SEQ ID NO: 85) 

See individual mutations 
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InitiaJ attempts to express the triple mutants in Bacillus were unsuccessful, as indicated by SDS-PAGE 
of crude supernatant*. We reasoned that the source of the expression problem could lie in die fact that 
correct folding and maturation of subtilisin requires autotytic cleavage of its propeptide (Power, S.D., 
Adams. R. M.. and Wells, J. A. (1986) Pro* Natl. Acad Set USA 83. 3096-3100). The processing site in 
5 the wild-type enzyme has a sequence that is optimized for the natural substrate preference. AHAYI A (I 
denotes the she of cleavage). Although the N62D/G166D subtilisin can still autolyze itself win the wild- 
type processing site, the additional S4 pocket mutations could reduce the cleavage to the point where 
expression was lowered to a minute level. 

To test whether the mutants were expressed poorly due to an inability to autolytically process itself, 
l o mutations in the processing she were simultaneously incorporated to accommodate the changes in substrate 
specificity. Thus the sequence from positions -4 to -1 was changed from A HAY to RHKR in combination 
with the S4 she mutations. For N62D/YI04D/GI66D, high levels of expression could then be achieved 
providing an indication that the additional Y104D mutation induced an especially strong preference for P4 
Arg over Ala. Variants containing the II07E mutation, however, could not be expressed even with the 
15 change in the processing she. 

Kinetic analyt b of variant luhKiUsins 

The mature N62D/Y 1 04D/G 166D variant was purified and analyzed for its ability to hydrolyze several 
tetrapepodc-pNA substrates. Table 5 displays the results along with data for the N62D/GI66D mutant and 
wild-type subtilisin. 
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The tribasic substrates succinyl-RAKR-pNA (SEQ ID NO: 86) and succinyl-KAKR-pNA (SEQ ID 
NO: 87) were hydroryzed with high catalytic efficiency (k^/Km) by the triple mutant, at a level similar to 
wild-type subtilisin versus one of its best substrates, succinyl-AAPF-pNA (SEQ ID NO: 56). In contrast, 
the dibasic substrate succinyl-AAKR-pNA (SEQ ID NO: 67) was hydrotyzed 60-fold less efficiently, mostly 
S due to dimunition of k^. This indicates a dramatic specificity change from the wild-type preference at P4, 
at which hydrophobic residues are itrongly favored over charged side chains (Gran, H. and Breddam. K. 
(1992) Biochemistry 3 1, 8967-8971). In fact N62D/G1 66D subtilisin appears to cleave at an alternate site 
in the succiny 1-RAKR-pNA (SEQ ID NO: 86) substrate, indicating that Arg was not accepted in its wild-type 
S4stte. 

10 The large magnitude of the combined specificity changes in the N62D/Y104D/GI66D variant is 

evidenced by its strong discrimination against substrates that are preferred by the wild-type enzyme. For 
example. succinyl-AAPF-pNA (SEQ ID NO: 56) is hydrolyzed 6 x I0 4 -fotd less efficiently than succinyl- 
RAKR-pNA (SEQ ID NO: 86). Clearly, the S4 site mutation greatly improves upon the discriminatory 
power of the parent dibasic-specific N62D/G166D subtilisin, where the ratio of catalytic efficiency for 

X5 succmyl-AAKR-pNA versus succinyl-AAPF-pNA is 1 .9 x 10 2 . The improvement in discrimination (3 10- 
foki) b also higher than would be predicted from the data for hydrolysis of succinyl-RAKR-pNA (SEQ ID 
NO: 86) versus succinyl-AAKR-pN A (SEQ ID NO: 67) by the triple mutant (a 60-fold effect). 

Therefore in order to produce subtilisin variants capable of cleaving substrates containing basic 
residues at positions P4, P2, and PI, additional site specific substitutions are made in the dibasic specific 

20 subtiusin variants. According to this aspect of the invention, substitution of the amino acid corresponding 
to Y104 of subtilisin BPN 1 produced by Baciiha Amyloliquefaciem, i.e., amino acid 104 of subtilisin BPN" 
or its equivalent, produces a variant having substantially altered substrate specificity. In a preferred 
embodiment of the present invention amino acids corresponding to N62, Y 104, and G 166 of subtilisin BPN 1 
are substituted with acidic amino acids, preferably Asp and Ghi and most preferably Asp. Subtilisin BPN' 

25 variants N62D/Y104D/G166D, N62D/Y104E/GI66D, N62E/Y104DTC3166E. N62E/Y104E/GI66E, 
N62E/Y104D/G166D,N62E/Y104E/G166D, N62D/Y104E/G166E, and N62D/YI04D/G166E, and there 
equivalents are preferred. Most preferred among this group of subtilisin variants are the 
N62D/Y104D/G166D subtilisin BPN 1 variants and their equivalents. 



MMMfipcjUa and Synthetic Techniques 
3 o Various techniques are available which may be employed to produce mutant DN A. which can encode 

the subtilisin variants of the present invention. For instance, it is possible to derive mutant DNA based on 
naturally occurring DNA sequences that encode for changes in an amino acid sequence of the resultant 
protein relative to a precursor subtilisin. These mutant DNA can be used to obtain the variants of the present 
invention. 

35 According to the invention, specific residues of B. amyloUquefaciens subtilisin are identified for 

substitution. These ammo acid residue position numbers refer to those assigned to the B. amyloUquefaciens 
subtilisin sequence (see the mature sequence in Fig. I. of U.S. Patent No. 4,760,025). The invention, 
however, is not limited to the mutation of mis particular subtilisin but extends to precursor subtilisins 
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containing ammo acid residues which are equivalent, as defined herein, to the particular identified residues 
bi B. amyloiiquefoeiens subtilisin. Equivalent amino acids can be found in, for instance, subtilisn Carlesberg 
fiom Bacillus Ikhemfomis, Bode et al, (1986) EMBO J., 5:813-8 18; thermitase from Tkermoactinomyces 
vulgaris, Gros et al„ (1989)J. Mol. Biol 210:347-367; and proteinase K from Tritirachium album. Bettel, 
S et al., (1988) Acto Crysollogr., B, 44:16M72) as described by Siezen et al., (199 J) Prof. Eng., 4: 7 19-737). 

By way of illustration, with expression vectors encoding the precursor subtilisin in hand (see for 
example U.S. Patent No 4,760,025) site specific mutagenesis (Kunkel et al., (1991) Methods Enzymol. 
204:125-139; Carter, P., et al., (1986) NucL Acids. Res. 13:4331; Zoller, M. J. et al., (1982) Nucl. Acids 
Res. 10:6487), cassette mutagenesis (Wells, J. A., et al., (1985) Gene 34315). restriction selection 
10 mutagenesis (Wells, J. A., et al., (1986) Philos. Trans, R. Soc. London Ser A 317, 415) or other known 
techniques may be performed on the DNA. The mutant DNA can then be used in place of the parent DNA 
by insertion into the appropriate expression vectors. Growth of host bacteria containing the expression 
vectors with the mutant DNA allows the production of variants which can be isolated as described herein. 
Oiigonucleotide-mediated mutagenesis is a preferred method for preparing the variants of the present 
15 invention. This technique is well known in the art as described by Adelman et al.. (1983) DMA. 2: 1 83. 
Briefly, the native or unaltered DNA of a precursor subtilisin, for instance subtilisin BPN\ is altered by 
hybridizing an oligonucleotide encoding the desired imitation to a DNA template, where the template is the 
smgle-stranded form of a plasmid or bacteriophage containing the unaltered or native DNA sequence of the 



20 



After hybridi2xdon, a DNA polymerase is used to synthesize an entire second complementary strand 
of the template that will thus mcorporate the oligonucleotide primer, and will code for the selected alteration 
in the DNA. 

Generally, oligc«ucteotides of at least 25 nucleotides in length are used. An optimal oligonucleotide 
will have 12 to 15 nucleotides that are completely complementary to the template on either side of the 
25 nucleotide^) coding for the mutation. This ensures that the oligonucleotide will hybridize properly to the 
lingle-stranded DNA template molecule. The oligonucleotides are readily synthesized using techniques 
known in the art such as those oescribed by Crea et al (1987) Proc. Natl. Acad. Sci. USA, 75:5765. 
Exemplary oligonucleotide sequences for introducing amino acid changes into precursor subtilisin BPN* are 
provided in Tables 2 and 4. 

30 Single-stranded DNA template may also be generated by denaturing double-stranded plasmid (or 

other) DNA using standard techniques. 

For alteration of the native DNA sequence (to generate amino acid sequence variants, for example). 

the oligonucleotide is hybridized to the single-stranded template under suitable hybridization conditions. 

A DNA polymerizing enzyme, usually the Klenow fragment of DNA polymerase I, is then added to 
3 5 synthesize the complementary strand of the template using the oligonucleotide as a primer for synthesis. A 

heteroduplex molecule is thus formed such that one strand ofDNA encodes me v^ 

and the other strand (the original template) encodes the native, unaltered sequence of the precursor subtilisin. 

Wis heteroduplex molecule is then transformed into a suitable host cell. After the cells are grown, they are 

plated onto agarose plates and screened using the oligonucleotide primer radiolabeled with 32-phosphate to 
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identify the bacterial colonies that contain the mutated DNA. The mutated region is then removed and 
placed in an appropriate vector for protein production, generally an expression vector of the type typically 
employed for transformation of an appropriate host 

The method described immediately above may be modified such that a homoduplex molecule is 
5 created wherein both strands of the plasmid contain the mutations). The modifications are as follows: The 
single-stranded oligonucleotide is annealed to the single-stranded template as described above. A mixture 
of three deoxyribonucleotides, deoxyriboadenosine (dATPX deoxyriboguanosine (dGTP), and 
deoxyribothymidine (dTTPX is combined with a modified tiuo-ocoxyribocytosine called dCTP-(aS) (which 
can be obtained from Amersham Corporation). This mixture is added to the template-otigonucleotide 

10 complex. Upon addition of DNA polymerase to this mixture, a strand of DNA identical to the template 
except for the mutated bases is generated. In addition, this new strand of DNA will contain dCTP-(oS) 
instead of dCTP, which serves to protect it from restriction endonuclease digestion. 

After the template strand of the double-stranded heteroduplex is nicked with an appropriate restriction 
enzyme, the template strand can be digested with Exolll nuclease or another appropriate nuclease past the 

15 region that contains the she(s) to be mutagenized The reaction is then stopped to leave a molecule that is 
only partially single-stranded A complete double-stranded DNA homoduplex is then formed using DNA 
polymerase in the presence of all four deoxyribonucteotide triphosphates, ATP, and DNA ligase. This . 
homoduplex molecule can then be transformed into a suitable host cell as described above. 

DNA encoding variants with more than one ammo acid to be substituted may be generated in one of 

20 several ways. If the amino acids are located close together in the polypeptide chain, they may be mutated 
simultaneously using one oligonucleotide mat codes for all of the desired amino acid substitutions. If, 
however, the amino acids ire located some distance from each other (icparated by more thin about ten imrno 
acids), it is more difficult to generate a single oligonucleotide mat encodes all of the desired changes. 
Instead, one of two alternative methods may be employed. 

25 In the first method, a separate oligonucleotide is generated for each amino acid to be substituted. The 

oligonucleotides are then annealed to the single-stranded template DNA simultaneously, and the second 
strand of DNA that is synthesized from the template will encode all of the desired amino acid substitutions. 

The alternative method involves two or more rounds of mutagenesis to produce the desired mutant. 
The first round is as described for the single mutants: wild-type DNA is used for the template, an 

3 0 oligonucleotide encoding the first desired amino acid substitution^) is annealed to this template, and the 
heteroduplex DNA molecule is men generated. The second round of mutagenesis utilizes the mutated DNA 
produced in the first round of mutagenesis as the template. Thus, this template already contains one or more 
mutations. The oligonucleotide encoding the additional desired amino acid substitution(s) is then annealed 
to this template, and the resulting strand of DNA now encodes mutations from both the first and second 

3 5 rounds of mutagenesis. This resultant DNA can be used as a template in a third round of mutagenesis, and 
soon. 
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CUavcge of a Fusion Proteins With SubtWsin Variants 

A fusion protein fa any polypeptide that contains within it an affinity domain (AD) that usually aids 
in prcxein punficatkm, a protease cleavage sequence or substrate linker (SU which Udeaved by a protease 
and a protein product of interest (PP). Such fusion proteins are generally expressed by recombinant DNA 
5 technology. The genes for fusion proteins are designed so that the SL is between the AD and PP. These 
usually take the fonn AD-SL-PP such that the domain closest to the N-tenninus is AD and PP is closest to 
the C-tertninus. 

Examples of AD would include, glutathtone-S-transferase which binds to glutathione, protein A (or 
derivatives or fragments thereof) which binds IgG molecules, poly-histidine sequences, particularly (His* 
10 (SBQ ID NO: 51) that bind metal affinity columns, maltose binding protein that binds maltose, human 
growth hormone that binds the human growth hormone receptor or any of « variety of other proteins or 
protein domains that can hind to an immobilized affinity support with an association constant (Ka) of > 10 s 
M". 

The SL can be any sequence which is cleaved by the subtilisin variants of the present invention. In 
1 5 preparations where die variant N62D/Y 104D/G I66D or its equivalent are used the SL can be any sequence, 
preferably at least 4 amino acids, in which the P4. P2. and PI residues are basic residues. Therefore a SL 
linker is employed of the general formula P4-P3-P2-P1 wherein P4. P2, and PI are basic amino acid 
residues. Preferred SLs according to this aspect of the invention include Lys-Ala-Lys-Arg (SEQ ID NO: 87) 
and Arg-Ale-Lys-Arg (SEQ ID NO: 86). 
20 Likewise, where the N62D/G166D subtilisin variant is contemplated the SL preferably contains di- 

basic residues. For the variants capable of cleaving substrates containing dibasic residues the SL should be 
at least four residues and preferably contain a large hydrophobic residue at P4 (such as Leu or Met) and 
dibasic residues at P2 and PI (such as Arg and Lys). A particularly good substrate is Leu-Met-Arg-Lys- 
(SEQ ID NO: 52), but a variety of other sequences may work including Ala-Ser-Aig-Arg (SEQ ID NO: 50) 
25 and even Leu-Thr-Ala-Arg (SEQ ID NO 53). 

h is often useful that the SL contain a flexible segment on its N-terminus to better separate it from the 
AD and PP. Such sequences include Oly-Pre-Oly^ly (SEQ ID NO. 54) but can be as simple as Gly-Oly 
or Pro-Gly. Thus, an example of a particularly good SL would have the sequence Gly.Pre-Gly-Gly-Leu- 
Met-Aig-Lys (SEQ ID NO: 8S) in the case of subtilisin variants capable of cleaving substrates containing 
30 dibwie amino acids, or Gly- Pro-0ly-Sly-Ly»-Al-Lya -Arg (SEQ ID NO: 89). This sequence 

would be insetted between the AD and PP domains. 

The PP can be virtually any protein or peptide of interest but preferably should not have a Pro. lie. 
Thr. Val. Asp or Clu as its fust residue (PI 1 ), or Pro or Gly at the second residue (P2 1 ) or Pro at the third 
residue (P3-). Such residues are poor substrates for the enzyme and may impair the ability of the subtilisins 
3 5 variant to cleave the SL sequence. 

The conditions for cleaving the fusion protein are best done in aqueous solution, although it should 
be possible to immobilize the enzyme andcleave the soluble fusion protein. It may also be possible to cleave 
the fusion protein as it remains immobilized on a solid support (e.g. bound to the solid support through AD) 
with the soluble subtilisin variant It is preferable to add the enzyme to the fusion protein so that the enzyme 
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is less than one part in 100(1:100) by weight. A good buffer is 10-50mM Tris (pH 8.2) in lOmMNaCl. 
A preferable temperature is about 25°C although the enzyme is active up to 65°C. The extent of cleavage 
can be assayed by applying samples to SDS-PAGE. Generally suitable conditions for using the subtilisin 
variants of this invention do not depart substantially from those known in the art for the use of other 
5 subtilising 

EXAMPLES 

In the examples below and elsewhere, the following abbreviations are employed: subtilisin BPN', 
subtilisin from Bacillus amyloHquefaciens\ Boc-RVRR-MCA (SEQ ID NO. 73 ) , N-t-butoxy carbonyl- 
aiginbe-valine'ai^inine-arginbx-7-amido-4^ethyl coumarin; suc-Ala-Ala-Pro-Phe-pna (SEQ id NO. 
10 56), N-succinyl-aJan ine-alan ine-proline-phenylalany^p-nhroanalide ( SEQ ID NO. 56); bGH, human 
growth hormone; hGHbp, extraceUular domain of the hGH receptor. PBS, phosphate buffered saline; AP, 
alkaline phosphatase; 

Example 1 

Construction mnd Purification of Subtilisin Mutants. 
15 Site-directed mutations were introduced into the subtilisin BPN* gene cloned into the phagemid pSS5 

(Wells, J. A., Ferrari, E., Henner, D. J., Estell, D.A. and Chen, E. Y. (1983) Nucl Acids Res, 1 1:791 1- 
7929). Single-stranded uracil-containing pSS5 template was prepared and mutagenesis performed using the 
method of Kunkel (Kunkel T. A. , Bebenek, K and McClary, J. (1991) Methods Enzymol. 204: 125-139). 
For example, the synthetic oligonucleotide N62D, 

20 (5'- CCAAGACAACG*ACTCTCACGGAA -3^ (SEQ ID 

NO. 25) 

in which the asterisk denotes a mismatch to the wild-type sequence, was used to construct the N62D mutant. 
The oligonucleotide was first phosphorylated at the 5' end using T4 polynucleotide kinase according to a 
described procedure (Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) in "Molecular Cloning: A 

2 5 Laboratory Manual,* 1 Second Edition, Cold Spring Harbor, N.Y.). The phosphorylated oligonucleotide was 
annealed to single-stranded uracil-containing pSS5 template, the complementary DNA strand was filled in 
with deoxynucleotides using T7 polynucleotide kinase, and the resulting nicks ligated using T4 DNA ligase 
according to a previously described procedure (Sambrook, J., Fritsch, E. F., and Maniatis, T. (1989) in 
"Molecular Cloning: A Laboratory Manual," Second Edition, Cold Spring Harbor, N.Y.). Heteroduplex 

30 DNA was transformed into the E. coli host JM101(Yanish-Perron. C M Viera, J., and Messing, J. (1985) Gene 
33: 103-199), and putative mutants were confirmed by preparation and dideoxy nucleotide sequencing of 
single stranded DNA (Sanger, F.. Nicklen, S. and Coulson, A. R. {\9TT)Proc. Soli Acad Set. USA 
74:5463-5467) according to the SEQUENASE* protocol (USB Biochemicals). Mutant single-stranded 
DNA was then retransformed into JMI01 cells and double stranded DNA prepared according to a previously 
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described procedure (Sambrook, J., Fritsch, £. F„ and Maniatis, T. (1989) in -Molecular Cloning: A 
laboratory Manual.- Second Edition, Cold Spring Harbor, N.Y.). For other mutations also requiring the use 
of one primer, the oligonucleotides used are listed in Table 2. For several of these oligonucleotides, 
additional silent mutations emplacing new restriction sites were simultaneously introduced to provide an 
5 alternative verification of mutagenesis. 

To construct the double mutants N62D/G166D, and N62D/G166E, pSS5 DNA containing the N62D 
mutation was produced in single-stranded uracil-containing form using the Kunkel procedure (Kunkel. T. 
A. , Bebenek, K and McClary. I. (1991) Methods Enzymol 204. 125-139). This mutant DNA was used as 
template for the further introduction of the G166D or G166E mutations, using the appropriate 
10 oligonucleotide primers (see sequences in Table 2), following the procedures described above. 

To construct the triple mutants, such as N62D/Y104D/GI66D, pSSS DNA containing the 
N62D/G166D mutation or other appropriate double mutation, was produced in single-stranded uracil- 
containing form using the Kunkel procedure (Kunkel, T. A. , Bebenek, K and McClary, J. (1991) Methods 
Enzymol. 204, 125-139). This mutant DNA was used as template for the further introduction of the Y104D 
15 mutations, using the appropriate oligonucleotide primers (see sequences in Table 4), following the 
procedures described above. 

For expression of the subtilisin BFN* mutants, double stranded mutant DNA was transformed into a 
protease-deftcient strain (BG2036) of Bacillus Subtitis (Yang, M. Y., Ferrari, E. and Henner. D. J. 
(1984V<mrnfl/ of Bacteriology 160:15-21) according to a previous method (Anagnostopoteuus, C. and 
20 Spizizen, J. (1961) Journal of Bacteriology 81:741-746) in which transformation mixtures were plated out 
on LB plus skim milk plates containing 12.5 ug/mL chloramphenicol. Tne clear halos indicative of skim 
milk digestion surrounding transformed colonies were noted to roughly estimate secreted protease activity. 

The transformed BG2036 strains were cultured by inoculating 5 mL of 2xYT media (Miller, J. H„ 
(1972) in "Experiments in Molecular Genetics," Cold Spring Harbor. N.Y.) containing 12.5 ug/mL 
2 5 chloramphenicol and 2 mM CaCI 2 at 37 »C for 18-20 h. followed by 1:100 dilution in the same medium and 
growth in shake flasks at 37 °C for 18-22 b with vigorous aeration. The cells were harvested by 
centrifugation (6000g, 1 5 min, 4*C), and to the supernatant 20mM (final) CaCl 2 and one volume of ethanol 
<-20°C) were added. After 30 min at 4°C, the solution was centrifuged <12,00Qg, 15 min, 4°C). and one 
volume of ethanol (-2<T>C) added to the supernatant After 2 h at -20 e C, the solution was centrifuged 
30 (12,000g. 15 min. 4*C) and th« pellet resuspended in and dialy«d against MC (25 mM 2-{N- 
Morpholino)ethanesulfonic acid (MES), 5 mM Cad, at pH 5.5) overnight at 4«G Tne dialysate was passed 
through a 022 urn syringe filter and loaded onto a mono-S cation exchange column run by an FPLC system 
(Pharmacia Biotechnology). Tne column was washed with 20 volumes of MC and mutant subtilisin eluted 
overalincar gradient of zero to 0.15 M NaCl in MC. all ataflow rate of 1 mL/min. Peak fractions were 
35 recovered and the subtilisin mutant quanthated by measuring the absorbance at 280 nm (E^o 0.1% -1.17) 
(Matsubara, H.; Kasper, C B.; Brown, D. M.; and Smith, E. L. <1965>/. Biol Chem., 240:1 125-1 130.) 
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Example 2 

Kinetic Characterizations 

Subtilisins were assayed by measuring the initial rates of hydrolysis of p-nhroanilidc tctrapeptide 
substrates in 0.4 mL 20 mM Tris-Cl pH 8.2, 4 % (wV) dimethyl sulfoxide at (25 * 0.2)°C as described 
5 previously (Estell, D. A., Graycar, T. P.. Miller, J. V„ Powers, D. Burnier, I P., Ng, P. G. and Wells, 
J. A. (\9%6)Science 233:659-663). Enzyme concentrations (EX, were determined spectrophotometrically 
using E 2l0-m 0.J% - 1.17 (Matsubara, H.; Kasper, C B.; Brown, D. M.; and Smith. E L. (1965)J. Biol. 
Chem.,240: 1 125-1 130.), and were typically 5-50 nM in reactions. Initial rates were determined for nine to 
twelve different substrate concentrations over the range of 0.001-2.0 mM. Plots of initial rates (v) versus 
1 0 substrate concentration (S) were fitted to the Michaelis-Menton equation, 

WEtf(S)) 



Km + (S) 

to determine the kinetic constants k^ and Km (Fersht, A. in "Enzyme Structure and Mechanism", Second 
15 edition. Freeman and Co.. N.Y.) using the program Kaleidagraph (Synergy Software, Reading, PA). 

Example 3 

Substrate Phage 

Substrate phage selections were performed as described by Matthews and Wells (Matthews, D. J. and 
Wells, J. A (!993)SWence 260:1 1 13-1 1 17), with minor imxtincarjoris. Phage sorting was carried out using 

20 i library in which the linker sequence between the gene IH coat protein and t tight-binding variant of hGH 
was GPGGXjGGPG (SEQ ID NO. 52). The library contained 2 x 10* independent transformams. Phage 
particles were prepared by mfecting 1 mL of log phase 27C7 (F/tet^Ompt w degF") Escherichia coii with 
approximately 10 1 library phage for 1 h at 37°C. followed by 18-24 h of growth in 25 roL 2YT medium 
containing 10 10 MI3K07 helper phage and 50 ug/mL carbenicillin at 3TC. Wells of a 96-well Nunc 

25 Maxisorb microther plate were coated with 2 ug/mL ofhGHbp in 50 mM NaHCO, at pH 9.6 overnight at 
4°C and blocked with PBS (10 mM sodium phosphate at pH 7.4 nd 150 mM NaCI) containing 2.5% (w/v) 
skim milk fori hat room temperature. Between 10 u and 10 1J phagein0.l mL lOmMrris-CI (pH7.6), I 
mMEDTA. and 1 00 mM NaCl were incubated in the wells at room temperature for 2 h with gentle agitation. 
The plate was washed first with 20 rinses of PBS plus 0.05% Tween 20 and then twice with 20 mM tris-CI 

30 at pH 8.2. The N62D/G166D aubtilisin was added in 0.1 mL of 20 mM tris-CI at pH 8.2 and protease 
sensitive phage were eluted after a variable reaction time. The concentration of protease and incubation 
times for elution of sensitive phage were decreased gradually over the course of sorting procedure to increase 
selectivity, with protease concentrations of02 nM (rounds 1-3) and 0. 1 nM (rounds 4-9), and reaction times 
of 5 min (rounds 1-6), 2.5 min (round 7). 40 s (round 8) and 20 s (round 9). Control wells in which no 

35 protease was added were also included in each round. For the resistant phage pool, the incubation time with 
protease remained constant at 5 min. The wells were then washed ten times with PBS plus 0.05% Tween 
20 and resistant phage eluted by treatment with 0.1 mL of 02 M glycine at pH 2.0 in PBS plus 0.05% Tween 
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20 for 1 min at room tempermure. Pro*.* sensitive and ^is,™, r^e pooU were ti,e«d »d us«o «o 
infect tog phase 27C7 cell, for I h at 37°C. followed by eentrifugation at 4000 rpm. removal of supernatant. 
md «,uspension in I mL 2YT medium. The infected cell, we* then grown 18-24 h in the presence of 
bdper phage as described above and the process repeated 9 rimes. Selected substrate, were introduced into 
S AP fusion proteins and assayed for relative rates of cleavage as described by Matthews and Wells (Matthews. 
D. J., Goodman. L. J..Gon»an. C. M, and Weill J. A. (1994)Protem Science 3:1 197-1205 and Matthews. 
D.J. and Wells, J. A. (1993)&ie** 260: 1 1 13-1 1 17). except that the cleavage reactions were performed m 
20mMTri*-ClespHS.2. 

Example 4 

10 Substrate phage selection and cleavage of a /tabu protein 

SubtUisin has the capability to bind substrates from the P4 to P3' positions (McPhalen. C. A. and 
James. N. G. (1988) Btochmbtey 27:6582-6598 and Bode. W.. Papunokos, E, Musil. D.. Seemueller. U. 
and Fritz. M. (1986) EMBOJ. 5:813-818). Given this extensive binding site and the apparent cooperative 
nature in the way the substrate can bind the enzyme we wished to explore more broadly the substrate 
15 preferences for .he enzyme. To do this we utilized me subarate phage selection (Matthews. D. J.. Goodman. 
L. J.. Gommn. C. M., and Well*. J. A. (1994)Protem Science 3:1 197- 1205 and Matthew*. D. J. and Wells. 
J. A.'(1993)&fence 260: 1 1 13-1 1 17) described in Example 3. to mis method a five-residue substrate linker 
that was flanked by d^lycine residues is inserted between an affinity domain (in this case a high affinity 
vrtamof hGH) and the carboxy-termin.1 domain of gene III. a minor coat protein displayed on the surface 
20 of the filamentous phage, Ml 3. The five residue substrate linker is fully randomized to generate a library of 
2C different protein sequence var-nts. These arc delayed on the phage particles which are allowed to bind 
to.hehGHbp.The pnJease of interest was added and if it cleaved the phage particle at the substrate Imker 
h released mat panicle. The particles released by protease treatment can be propagated and subjected to 
another round of selection to further enrich for good protean substrates. Sequences ma. are retained can 
25 al^bcpropag^toenrichforp^prote^subB^. By «quencing the isoliUed phage gene, attend 
of either selection one can identify good and poor substrates for further analysis. 

We chose to focus on the subtilism BPK variant N62O0166D as it was slightly better at 
discriminating the synthetic dibask substraKs from the om«. We subjected the subswte pf-ge library to 
„«« rounds of selection with the subtilism variant and isolated clones that were eimer increasingly sensmve 
30 or^tant to cleavage. Of twenty.** clones sequenced from the sensftive pool eighteen contained dibask 
residues, eleven of which had the substrate linker sequence Asn-Leu-Met-Arg-Lys (SEQ ID NO. 35) (Table 



«)• 
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TABLE 6 

Substrate phage sequences sensitive or resistant to W2WGMD subtilisin from a GG*xxxx-GG library 



after 9 rounds of selection'. 



25 



Protease Sensitive Poo! 



5 Mntw sit«Kf(n Monobasic Sites fl) PfttthStefH) 

K L T A R(3) N L M R K(ll) 

(SEQ ID NO: 34) (SEQ ID NO: 35 

TASK R(4) 
(SSQ ID NO: 36 

L T R R S 

(SEQ ID NO: 37 

A I 5 R X 

(SEQ ID NO: 38) 

L M L R K 

(SEQ ID NO: 39) 



Protease Resistant Pool 

Mniw sit«rn t Mnnnh«i C sites m pfta s ic Sto ( 1 ) 

ASTHP QKPNP RRPTH 

(SEQ ID NO: 40) (SEQ ID NO: 41) (SEQ ID NO: 42) 

10 I QQQY RPGAM 

(SEQ ID NO: 43) (SEQ ID NO: 44) 

Q G E L P 
(SEQ ID NO: 47) 

A P D P T 
15 (SEQ ID NO: 46) 

QLLEH 
(SEQ ID NO: 47) 

V N N N H 
(SEQ ID NO: 48) 

20 A Q S N L 

(SEQ ID NO: 49) 

* Numbers in parentheses indicate the number of times a particular DNA sequence was isolated. 

Three (3) of the sensitive sequences were monobasic, Asn-Uu-Thr-Ala-Arg (SEQ ID NO: 34). It is 
known that subtilisin has a preference for hydrophobic residues at the P4 position. If these and the other 
selected substrates were indeed cleaved after the last basic residue they all would have a Uu, Met or Ala at 
the P4 position. Almost no basic residues were isolated in the protease resistant pool and those that were had 
a Pro following the mono- or dibasic residue. It is known that subtilisin does not cleave substrates containing 
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Pit) at the PI' position (Carter, P.. Nilsson, B., Burnier, J., Burdick, D. and Wells, J. A. (1 989) Proteins: 
Struct. Funct. Genet 6:240-248). Thus, di-basic substrates where highly selected and these had the 
additional feature of Leu, Met or Ala at the P4 position. 

Example 5 

5 Cleavage of Substrate Linkers 

We wished to analyze how efficiently the most frequently selected sequences were cleaved in the 
context of a fusion protein. For this we applied an alkaline phosphatase-fusion protein assay (Matthews, D. 
J„ Goodman, L. J., Gorman, C. M., and Wells, J. A. (\m)Protem Science 3:1 197-1205 and Matthews, D. 
J. and Wells, J. A. ( \993)Science 260: 1 1 13-1 11 7). The hGH substrate linker domains were excised from the 

10 phage vector by PCR and fused in front of the gene for £. coli AP. The fusion protein was expressed and 
purified on an hGH receptor affinity column. The fusion protein was bound to the hGH receptor on a plate 
and treated with the subtilistn variant. The rate of cleavage of the fusion protein from the plate was monitored 
by collecting soluble fractions as a function of time and assaying for AP activity (Figure 5). The most 
frequently isolated substrate sequence. Asn-Leu-Met-Arg-Lys (SEQ ID NO: 3 5) was cleaved about ten times 

15 faster than the next most frequently isolated clones (Thr-Ala-Ser-Arg-Arg (SEQ ID NO: 36) and Asn-Leu- 
Thr-Ala-Arg (SEQ ID NO: 34). The cleaved AP products were also recovered and subjected to N-terminal 
sequencing to determine the sites of cleavage (Figure 5), cleavage she denoted by I). In all three fusion 
proteins, this she was immediately following the dibasic or monobasic site according to the mutant subtilistn 
design. We also tested the dibasic sequence isolated from the resistant pool, namely Arg-Lys-Pro-Thr-HU 

2 0 (SEQ ID NO: 42). We observed no detectable cleavage above background for this substrate during the 
assay. 



The present invention has of necessity been discussed herein by reference to certain specific methods 
and materials. It is to be understood that the discussion of these specific methods and materials m no way 
25 constitutes any limitation on the scope of the present invention, which extends to any and all alternative 
materials and methods suitable for accomplishing the ends of the present invention. 



All references cited herein are expressly incorporated by reference. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

U) APPLICANT: Genentech, Inc. 

(ii) TITLE OF INVENTION: SUBTILISIN VARIANTS CAPABLE OF CLEAVING 

SUBSTRATES CONTAINING BASIC RESIDUES 

(iii) NUMBER OF SEQUENCES: 89 

(iv) CORRESPONDENCE ADDRESS: 

£A) ADDRESSEE: Genentech, Inc. 

(B) STREET: 460 Point San Bruno Blvd 

(C) CITY: South San Francisco 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94080 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM. TYPE: 3.5 inch, 1.44 Mb floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC- DOS /MS-DOS 

(D) SOFTWARE: WinPatin (Genentech) 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/398028 
(BJ FILING DATE: 03-MAR-1995 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Kubinec, Jeffrey S. 

(B) REGISTRATION NUMBER: 36,575 

(C) REFERENCE/ DOCKET NUMBER: P0936P1PCT 

(ix) TELECOMMUNICATION INFORMATION: 
(A) TELEPHONE: 415/225-8229 
(BJ TELEFAX: 415/952-9881 
(C) TELEX: 910/371-7168 

(2) INFORMATION FOR SEQ ID NO:l: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8119 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(Ki) SEQUENCE DESCRIPTION: SEQ ID NO:l: 



GAATTCNGGT CTACTAAAAT ATTATTCCAT ACTATACAAT TAATACACAG 50 

AATAATCTGT CTATTGGTTA TTCTGCAAAT GAAAAAAAGG AGAGGATAAA 100 

GA GTG AGA GGC AAA AAA GTA TGG ATC AGT TTG CTG TTT 138 
Val Arg Gly Lys Lys Val Trp lie Ser Leu Leu Phe 
-107 -105 -100 

GCT TTA GCG TTA ATC TTT ACG ATG GCG TTC GGC AGC ACA 177 
Ala Leu Ala Leu lie Phe Thr Met Ala Phe Gly Ser Thr 
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-95 



-90 



-85 



TCC TCT GCC CAG GCG GCA GGG AAA TCA AAC GGG GAA AAG 216 

Ser Ser Ala Gin Ala Ala Gly Lys Ser Asn Gly Glu Lys 
-00 -75 -70 

AAA TAT ATT GTC GGG TTT AAA CAG ACA ATG AGC ACG ATG 255 

Lys Tyr lie Val Gly Phe Lys Gin Thr Met Ser Thr Met 
-65 -60 



10 



15 



AGC GCC GCT AAG AAG AAA GAT GTC ATT TCT GAA AAA GGC 294 

Ser Ala Ala Lys Lys Lys Asp Val lie Ser Glu Lys Gly 
-55 -50 -45 

GGG AAA GTG CAA AAG CAA TTC AAA TAT GTA GAC GCA GCT 333 

Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp Ala Ala 

-40 -35 

TCA GCT ACA TTA AAC GAA AAA GCT GTA AAA GAA TTG AAA 372 

Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 



AAA GAC CCG AGC GTC GCT TAC GTT GAA GAA GAT CAC GTA 411 
Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val 
-15 -10 -5 

20 GCA CAT GCG TAC GCG CAG TCC GTG CCT TAC GGC GTA TCA 450 
Ala His Ala Tyr Ala Gin Ser Val Pro Tyr Gly Val Ser 
1 5 



25 



CAA ATT AAA GCC CCT GCT CTG CAC TCT CAA GGC TAC ACT 4B9 
Gin He Lys Ala Pro Ala Leu His Ser Gin Gly Tyr Thr 
10 15 20 



GGA TCA AAT GTT AAA GTA GCG GTT ATC GAC AGC GGT ATC 528 
Gly Ser Asn Val Lys Val Ala Val He Asp Ser Gly He 
25 30 35 

GAT TCT TCT CAT CCT GAT TTA AAG GTA GCA GGC GGA GCC 567 
30 Asp Ser Ser His Pro Asp Leu Lys Val Ala Gly Gly Ala 
40 45 

AGC ATG GTT CCT TCT GAA ACA AAT CCT TTC CAA GAC AAC 606 
Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp Asn 
50 55 60 

35 GAC TCT CAC GGA ACT CAC GTT GCC GGC ACA GTT GCG GCT 645 
Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala 
65 70 

CTT AAT AAC TCA ATC GGT GTA TTA GGC GTT GCG CCA AGC 684 
Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser 
40 75 80 85 

GCA TCA CTT TAC GCT GTA AAA GTT CTC GGT GCT GAC GGT 723 
Ala Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly 
90 95 100 

TCC GGC CAA TAC AGC TGG ATC ATT AAC GGA ATC GAG TGG "762 
45 Ser Gly Gin Tyr Ser Tip He He Asn Gly He Glu Trp 
105 110 

GCG ATC GCA AAC AAT ATG GAC GTT ATT AAC ATG AGC CTC 801 
Ala He Ala Asn Asn Met Asp Val He Asn Met Ser Leu 
115 120 125 
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GGC GGA CCT TCT GGT TCT GCT GCT TTA AAA GCG GCA GTT 840 
Gly Gly Pro Ser Gly Ser Ala Ala Uu Lys Ala Ala Val 
130 135 

GAT AAA GCC GTT GCA TCC GGC GTC GTA GTC GTT GCG GCA 879 
5 Asp Lys Ala Val Ala Ser Gly Val Val Val Val Ala Ala 
140 145 150 

GCC GGT AAC GAA GGC ACT TCC GGC AGC TCG TCG ACA GTG 918 
Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 . 160 165 

10 GAC TAC CCT GGC AAA TAC CCT TCT GTC ATT GCA GTA GGC 957 
Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 
170 175 

GCT GTT GAC AGC AGC AAC CAA AGA GCA TCT TTC TCA AGC 996 
Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser 
15 180 185 190 

GTA GGA CCT GAG CTT GAT GTC ATG GCA CCT GGC GTA TCT 1035 
Val Gly Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser 
195 200 

ATC CAA AGC ACG CTT CCT GGA AAC AAA TAC GGG GCG TAC 1074 
20 He Gin Ser Thr Leu Pro Gly Asn Lys Tyr Gly Ala Tyr 
205 210 215 

AAC GGT ACC TCA ATG GCA TCT CCG CAC GTT GCC GGA GCG 1113 
Asn Gly Thr Ser Met Ala Ser Pro His Val Ala Gly Ala 
220 225 230 

25 GCT GCT TTG ATT CTT TCT AAG CAC CCG AAC TGG ACA AAC 1152 
Ala Ala Leu He Leu Ser Lys His Pro Asn Trp Thr Asn 
235 240 

ACT CAA GTC CGC AGC AGT TTA GAA AAC ACC ACT ACA AAA 1191 
Thr Gin Val Arg Ser Ser Leu Glu Asn Thr Thr Thr Lys 
30 245 250 255 

CTT GGT GAT TCT TTC TAC TAT GGA AAA GGG CTG ATC AAC 1230 
Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He Asn 
260 265 

GTA CAG GCG GCA GCT CAG TA AAACATAAAA AACCGGCCTT 1270 
35 Val Gin Ala Ala Ala Gin 
270 275 



40 



45 



GGCCCCGCCG 


GTTTTTTATT 


ATTTTTCTTC 


CTCCGCATGT 


TCAATCCGCT 


1320 


CCATAATCGA 


CGGATGGCTC 


CCTCTGAAAA 


TTTTAACGAG 


AAACGGCGGG 


1370 


TTGACCCGGC 


TCAGTCCCGT 


AACGGCCAAG 


TCCTGAAACG 


TCTCAATCGC 


1420 


CGCTTCCCGG 


TTTCCGGTCA 


GCTCAATGCC 


GTAACGGTCG 


GCGGCGTTTT 


1470 


CCTGATACCG 


GGAGACGGCA 


TTCGTAATCG 


GATCCGGAAA 


TTGTAAACGT 


1520 


TAATATTTTG 


TTAAAATTCG 


CGTTAAATTT 


TTGTTAAATC 


AGCTCATTTT 


1570 


TTAACCAATA 


GGCCGAAATC 


GGCAAAATCC 


CTTATAAATC 


AAAAGAATAG 


1620 


ACCGAGATAG 


GGTTGAGTGT 


TGTTCCAGTT 


TGGAACAAGA 


GTCCACTATT 


1670 


AAAGAACGTG 


GACTCCAACG 


TCAAAGGGCG 


AAAAACCGTC 


TATCAGGGCT 


1720 
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ATGGCCCACT ACGTGAACCA TCACCCTAAT CAAGTTTTTT GGGGTCGAGG 1770 
TGCCGTAAAG CACTAAATCG GAACCCTAAA GGGAGCCCCC GATTTAGAGC 1820 
TTGACGGGGA AAGCCGGCGA ACGTGGCGAG AAAGGAAGGG AAGAAAGCGA 1870 
AAGGAGCGGG CGCTAGGGCG CTGGCAAGTG TAGCGGTCAC GCTGCGCGTA 1920 
5 ACCACCACAC CCGCCGCGCT TAATGCGCCG CTACAGGGCG CGTCCGGATC 1970 
NGATCCGACG CGAGGCTGGA TGGCCTTCCC CATTATGATT CTTCTCGCTT 2020 
CCGGCGGCAT CGGGATGCCC GCGTTGCAGG CCATGCTGTC CAGGCAGGTA 2070 
GATGACGACC ATCAGGGACA GCTTCAAGGA TCGCTCGCGG CTCTTACCAG 2120 
CCTAACTTCG ATCACTGGAC CGCTGATCGT CACGGCGATT TATGCCGCCT 2170 

10 CGGCGAGCAC ATGGAACGGG TTGGCATGGA TTGTAGGCGC CGCCCTATAC 2220 
CTTGTCTGCC TCCCCGCGTT GCGTCGCGGT GCATGGAGCC GGGCCACCTC 2270 
GACCTGAATG GAAGCCGGCG GCACCTCGCT AACGGATTCA CCACTCCAAG 2320 
AATTGGAGCC AATCAATTCT TGCGGAGAAC TGTGAATGCG CAAACCAACC 2370 
CTTGGCAGAA CATATCCATC GCGTCCGCCA TCTCCAGCAG CCGCACGCGG 2420 

15 CGCATCTCGG GCCGCGTTGC TGGCGTTTTT CCATAGGCTC CGCCCCCCTG 2470 
ACGAGCATCA CAAAAATCGA CGCTCAAGTC AGAGGTGGCG AAACCCGACA 2520 
GGACTATAAA GATACCAGGC GTTTCCCCCT GGAAGCTCCC TCGTGCGCTC 2570 
TCCTGTTCCG ACCCTGCCGC TTACCGGATA CCTGTCCGCC TTTCTCCCTT 2620 
CGGGAAGCGT GGCGCTTTCT CAATGCTCAC GCTGTAGGTA TCTCAGTTCG 2670 

20 GTGTAGGTCG TTCGCTCCAA GCTGGGCTGT GTGCACGAAC CCCCCGTTCA 2720 
GCCCGACCGC TGCGCCTTAT CCGGTAACTA TCGTCTTGAG TCCAACCCGG 2770 
TAAGACACGA CTTATCGCCA CTGGCAGCAG CCACTGGTAA CAGGATTAGC 2820 
AGAGCGAGGT ATGTAGGCGG TGCTACAGAG TTCTTGAAGT GGTGGCCTAA 2870 
CTACGGCTAC ACTAGAAGGA CAGTATTTGG TATCTGCGCT CTGCTGAAGC 2920 

25 CAGTTACCTT CGGAAAAAGA GTTGGTAGCT CTTGATCCGG CAAACAAACC 2970 
ACCGCTGGTA GCGGTGGTTT TTTTGTTTGC AAGCAGCAGA TTACGCGCAG 3020 
AAAAAAAGGA TCTCAAGAAG ATCCTTTGAT CTTTTCTACG GGGTCTGACG 3070 
CTCAGTGGAA CGAAAACTCA CGTTAAGGGA TTTTGGTCAT GAGATTATCA 3120 
AAAAGGATCT TCACCTAGAT CCTTTTAAAT TAAAAATGAA GTTTTAAATC 3170 

30 AATCTAAAGT ATATATGAGT AAACTTGGTC TGACAGTTAC CAATGCTTAA 3220 
TCAGTGAGGC ACCTATCTCA GCGATCTGTC TATTTCGTTC ATCCATAGTT 3270 
GCCTGACTCC CCGTCGTGTA GATAACTACG ATACGGGAGG GCTTACCATC 3320 
TGGCCCCAGT GCTGCAATGA TACCGCGAGA CCCACGCTCA CCGGCTCCAG 3370 
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ATTTATCA6C 


AATAAACCAG 


CCAGCCGGAA 


CCTGCAACTT 


TATCCGCCTC 


CATCCAGTCT 


TAGAGTAAGT 


AGTTCGCCAG 


TTAATAGTTT 


CTGCAGGCAT 


CGTGGTGTCA 


CGCTCGTCGT 


TCCGGTTCCC 


AACGATCAAG 


GCGAGTTACA 


AAAAGCGGTT 


AGCTCCTTCG 


GTCCTCCGAT 


CCGCAGTGTT 


ATCACTCATG 


GTTATGGCAG 


GTCATGCCAT 


CCGTAAGATG 


CTTTTCTGTG 


GTCATTCTGA 


GAATAGTGTA 


TGCGGCGACC 


CAACACGGGA 


TAATACCGCG 


CCACATAGCA 


ATTGGAAAAC 


GTTCTTCGGG 


GCGAAAACTC 


GAGATCCAGT 


TCGATGTAAC 


CCACTCGTGC 


CTTTTACTTT 


CACCAGCGTT 


TCTGGGTGAG 


GCCGCAAAAA 


AGGGAA7AAG 


GGCGACACGG 


CTTCCTTTTT 


CAATATTATT 


GAAGCATTTA 


GCGGATACAT 


ATTTGAATGT 


ATTTAGAAAA 


CGCACATTTC 


CCCGAAAAGT 


GCCACCTGAC 


CATGACATTA 


ACCTATAAAA 


ATAGGCGTAT 


AAGAATTAAT 


TCCTTAAGGA 


ACGTACAGAC 


CGTTTTTAAG 


GGGTTTGTAG 


ACAAGGTAAA 


AAGAAAAACA 


CGATTTAGAA 


CCTAAAAAGA 


AACCGAGAGG 


TAAAAAAAGA 


ACGAAGTCGA 


AAATAAAAAA 


AGCACCTGAA 


AAGGTGTCTT 


GTTCTTTCTT 


ATCTTGATAC 


ATATAGAAAT 


TGCTGAAAGG 


TGCGTTGAAG 


TGTTGGTATG 


AAACCCTTAA 


AATTGGTTGC 


ACAGAAAAAC 


GTGACTAAAC 


AAATAACTAA 


ATAGATGGGG 


TCCTAATAGT 


AGCATTTATT 


CAGATGAAAA 


AGACAAAAAG 


TGGAAAAGTG 


AGACCATGGA 


GTTGATTACT 


TTGAACTTCT 


GCATATTCTT 


AGTAAAAGAT 


TGTGCTGAAA 


TATTAGAGTA 


GCGAAAGAAA 


GTTGTATCGA 


GTGTGGTTTT 


ATGTGCAACT 


GGAGGAGAGC 


AATGAAACAT 
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GGGCCGAGCG 


CAGAAGTGGT 


3420 


ATTAATTGTT 


GCCGGGAAGC 


3470 


GCGCAACGTT 


GTTGCCATTG 


3520 


TTGGTATGGC 


TTCATTCAGC 


3570 


TGATCCCCCA 


TGTTGTGCAA 


3620 


CGTTGTCAGA 


AGTAAGTTGG 


3670 


CACTGCATAA 


TTCTCTTACT 


3720 


ACTGGTGAGT 


ACTCAACCAA 


3770 


GAGTTGCTCT 


TGCCCGGCGT 


3820 


GAACTTTAAA 


AGTGCTCATC 


3870 


TCAAGGATCT 


TACCGCTGTT 


3920 


ACCCAACTGA 


TCTTCAGCAT 


3970 


CAAAAACAGG 


AAGGCAAAAT 


4020 


AAATGTTGAA 


TACTCATACT 


4070 


TCAGGGTTAT 


TGTCTCATGA 


4120 


ATAAACAAAT 


AGGGGTTCCG 


4170 


GTCTAAGAAA 


CCATTATTAT 


4220 


CACGAGGCCC 


TTTCGTCTTC 


4270 


GGCTTAAAAG 


CCTTTAAAAA 


4320 


GGATAAAACA 


GCACAATTCC 


4370 


ACGAATTTGA 


ACTAACTCAT 


4420 


GATCAGGGAA 


TGAGTTTATA 


4470 


TTTTTGATGG 


TTTTGAACTT 


4520 


AACGTCATTT 


TTATTTTAGT 


4570 


TATGTGTTTT 


AAAGTATTGA 


4620 


CCCATCTGTT 


AAAGTTATAA 


4670 


GTTTCTTTTA 


ATATTATGTG 


4720 


ATCAAGGGTT 


TTAGTGGACA 


4770 


GAGAAAAGAA 


AATCGCTAAT 


4820 


GAATTTAAAA 


AGGCTGAAAG 


4870 


TAAACAAAAT 


CGTGAAACAG 


4920 


GTAAATCCAG 


GCTTTGTCCA 


4 970 


GGCATTCAGT 


CACAAAAGGT 


5020 
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TGTTGCTGAA GTTATTAAAC AAAAGCCAAC AGTTCGTTGG TTGTTTCTCA 5070 
CATTAACAGT TAAAAATGTT TATGATGGCG AAGAATTAAA TAAGAGTTTG 5X20 
TCAGATATGG CTCAAGGATT TCGCCGAATG ATGCAATATA AAAAAATTAA 5170 
TAAAAATCTT GTTGGTTTTA TGCGTGCAAC G6AAGTGACA ATAAATAATA 5220 
AAGATAATTC TTATAATCAG CACATGCATG TATTGGTATG TGTGGAACCA 5270 
ACTTATTTTA AGAATACAGA AAACTACGTG AATCAAAAAC AATGGATTCA 5320 
ATTTTGGAAA AAGGCAATGA AATTAGACTA TGATCCAAAT GTAAAAGTTC 5370 
AAATGATTCG ACCGAAAAAT AAATATAAAT CGGATATACA ATCGGCAATT 5420 
GACGAAACTG CAAAATATCC TGTAAAGGAT ACGGATTTTA TGACCGATGA 5470 
TGAAGAAAAG AATTTGAAAC GTTTGTCTGA TTTGGAGGAA GGTTTACACC 5520 
GTAAAAGGTT AATCTCCTAT GGTGGTTTGT TAAAAGAAAT ACATAAAAAA 5570 
TTAAACCTTG ATGACACAGA AGAAGGCGAT TTGATTCATA CAGATGATGA 5620 
CGAAAAAGCC GATGAAGATG GATTTTCTAT TATTGCAATG TGGAATTGGG 5670 
AACGGAAAAA TTATTTTATT AAAGAGTAGT TCAACAAACG GGCCAGTTTG 5720 
TTGAAGATTA GATGCTATAA TTGTTATTAA AAGGATTGAA GGATGCTTAG 5770 
GAAGACGAGT TATTAATAGC TGAATAAGAA CGGTGCTCTC CAAATATTCT 5B20 
TATTTAGAAA AGCAAATCTA AAATTATCTG AAAAGGGAAT GAGAATAGTG 5B70 
AATGGACCAA TAATAATGAC TAGAGAAGAA AGAATGAAGA TTGTTCATGA 5920 
AATTAAGGAA CGAATATTGG ATAAATATGG GGATGATGTT AAGGCTATTG 5970 
GTGTTTATGG CTCTCTTGGT CGTCAGACTG ATGGGCCCTA TTCGGATATT 6020 
GAGATGATGT GTGTCATGTC AACAGAGGAA GCAGAGTTCA GCCATGAATG 6070 
GACAACCGGT GAGTGGAAGG TGGAAGTGAA TTTTGATAGC GAAGAGATTC 6120 
TACTAGATTA TGCATCTCAG GTGGAATCAG ATTGGCCGCT TACACATGGT 6170 
CAATTTTTCT CTATTTTGCC GATTTATGAT TCAGGTGGAT ACTTAGAGAA 6220 
AGTGTATCAA ACTGCTAAAT CGGTAGAAGC CCAAACGTTC CACGATGCGA 6270 
TTTGTGCCCT TATCGTAGAA GAGCTGTTTG AATATGCAGG CAAATGGCGT 6320 
AATATTCGTG TGCAAGGACC GACAACATTT CTACCATCCT TGACTGTACA 6370 
GGTAGCAATG GCAGGTGCCA TGTTGATTGG TCTGCATCAT CGCATCTGTT 6420 
ATACGACGAG CGCTTCGGTC TTAACTGAAG CAGTTAAGCA ATCAGATCTT 6470 
CCTTCAGGTT ATGACCATCT GTGCCAGTTC GTAATGTCTG GTCAACTTTC 6520 
CGACTCTGAG AAACTTCTGG AATCGCTAGA GAATTTCTGG AATGGGATTC 6570 
AGGAGTGGAC AGAACGACAC GGATATATAG TGGATGTGTC AAAACGCATA 6620 
CCATTTTGAA CGATGACCTC TAATAATTGT TAATCATGTT GGTTACGTAT 6670 
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TTATTAACTT CTCCTAGTAT TAQTAATTAT CATGGCTGTC ATGGCGCATT 6720 
AACGGAATAA AGGGTGTGCT TAAATCGGGC CATTTTGCGT AATAAGAAAA 6770 
AGGATTAATT ATGAGCGAAT TGAATTAATA ATAAGGTAAT AGATTTACAT 6820 
TAGAAAATGA AAGGGGATTT TATGCGTGAG AATGTTACAG TCTATCCCGG 6870 
5 CAATAGTTAC CCTTATTATC AAGATAAGAA AGAAAAGGAT TTTTCGCTAC 6920 
GCTCAAATCC TTTAAAAAAA CACAAAAGAC CACATTTTTT AATGTGGTCT 6970 
TTATTCTTCA ACTAAAGCAC CCATTAGTTC AACAAACGAA AATTGGATAA 7020 
AGTGGGATAT TTTTAAAATA TATATTTATG TTACAGTAAT ATTGACTTTT 7070 
AAAAAAGGAT TGATTCTAAT GAAGAAAGCA GACAAGTAAG CCTCCTAAAT 7120 

10 TCACTTTAGA TAAAAATTTA GGAGGCATAT CAAATGAACT TTAATAAAAT 7170 
TGATTTAGAC AATTGGAAGA GAAAAGAGAT ATTTAATCAT TATTTGAACC 7220 
AACAAACGAC TTTTAGTATA ACCACAGAAA TTGATATTAG TGTTTTATAC 7270 
CGAAACATAA AACAAGAAGG ATATAAATTT TACCCTGCAT TTATTTTCTT 7320 
AGTGACAAGG GTGATAAACT CAAATACAGC TTTTAGAACT GGTTACAATA 7370 

15 GCGACGGAGA GTTAGGTTAT TGGGATAAGT TAGAGCCACT TTATACAATT 7420 
TTTGATGGTG TATCTAAAAC ATTCTCTGGT ATTTGGACTC CTGTAAAGAA 7470 
TGACTTCAAA GAGTTTTATG ATTTATACCT TTCTGATGTA GAGAAATATA 7520 
ATGGTTCGGG GAAATTGTTT CCCAAAACAC CTATACCTGA AAATGCTTTT 7570 
TCTCTTTCTA TTATTCCATG GACTTCATTT ACTGGGTTTA ACTTAAATAT 7620 

20 CAATAATAAT AGTAATTACC TTCTACCCAT TATTACAGCA GGAAAATTCA 7670 
TTAATAAAGG TAATTCAATA TATTTACCGC TATCTTTACA GGTACATCAT 7720 
TCTGTTTGTG ATGGTTATCA TGCAGGATTG TTTATGAACT CTATTCAGGA 7770 
ATTGTCAGAT AGGCCTAATG ACTGGCTTTT ATAATATGAG ATAATGCCGA 7820 
CTGTACTTTT TACAGTCGGT TTTCTAATGT CACTAACCTG CCCCGTTAGT 7870 

25 TGAAGAAGGT TTTTATATTA CAGCTCCAGA TCCATATCCT TCTTTTTCTG 7 920 
AACCGACTTC TCCTTTTTCG CTTCTTTATT CCAATTGCTT TATTGACGTT 7970 
GAGCCTCGGA ACCCNTATAG TGTGTTATAC TTTACTTGGA AGTGGTTGCC 8020 
GGAAAGAGCG AAAATGCCTC ACATTTGTGC CACCTAAAAA GGAGCGATTT 8070. 
ACATATGAGT TATGCAGTTT GTAGAATGCA AAAAGTGAAA TCAGGATCN 8119 

30 (2) INFORMATION FOR SCO ID NO:2: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 382 amino acids 

(B) TYPE: Amino Acid 
<D> TOPOLOGY: Linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
Val Arg Gly Lys Lys Val Trp lie Ser Leu Leu Phe Ala Leu Ala 
-101 -105 "100 95 

Leu lie Phe Thr Met Ala Phe Gly Ser Thr Ser Ser Ala Gin Ala 
5 -90 " 85 

Ala Gly Lys Ser Asn Gly Glu Lys Lys Tyr lie Val Gly Phe Lys 

Gin Thr Met Ser Thr Met Ser Ala Ala Lys Lys Lys Asp Val He 
-60 "55 ~ 3U 

10 Ser Glu Lys Gly Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp 
— 45 —40 J3 

Ala Ala Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 

-30 "25 - 20 

Lvs Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val Ala His 
15 -15 -1° " 5 

Ala Tyr Ala Gin Ser Val Pro Tyr Gly Val Ser Gin He Lys Ala 
I 5 10 

Pro Ala Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val 
15 20 25 

20 Ala Val lie Asp Ser Gly He Asp Ser Ser His Pro Asp Leu Lys 
30 35 40 

Val Ala Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe 
45 50 55 

Gin Asp Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala 
25 60 «5 70 

Ala Leu Asn Asn Ser lie Gly Val Leu Gly Val Ala Pro Ser Ala 
•75 80 86 

Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin 
90 95 * 00 

30 Tyr Ser Trp He He Asn Gly He Glu Trp Ala lie Ala Asn Asn 
105 HO 119 

Met Asp Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala 

120 125 1JU 

Ala Leu Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val 
35 135 H° " S 

Val Val Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser 

150 155 I 60 

Thr Val Asp Tyr Pro Gly Lys Tyr Pro Ser Val lie Ala Val Gly 
165 l™ 1 

40 Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly 
ISO i 85 
Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser He Gin Ser Thr 
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Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala 
215 220 

His Val Ala Gly Ala Ala Ala Leu He Leu Ser Lys His 
230 235 

Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr 
245 250 

Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He 
260 265 

Gin Ala Ala Ala Gin 
275 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 7 amino acids 

(B) TYPE: Amino Acid 
15 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

Ser Leu Gly Gly Pro Ser Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 4: 

20 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

25 Ala Ala Ala Gly Asn Glu Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 6 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Ser Thr Val Gly Tyr Pro 
1 5 6 

35 (2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

40 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

Ser Trp Gly Pro Ala Asp Asp 

1 5 7 

(2) INFORMATION FOR SEQ ID NO: 7: 
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Leu Pro 
210 

Ser Pro 
225 

5 Pro Asn 
240 

Thr Thr 
255 

Asn Val 
10 270 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Phe Ala Ser Gly Asn Gly Gly 
1 5 7 

(2) INFORMATION fW SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Cys Asn Tyr Asp Gly Tyr Thr 
15 1 5 7 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
20 (D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Ser Trp Gly Pro Glu Asp Asp 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 10: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

30 Trp Ala Ser Gly Asn Gly Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 11: 

<i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 7 amino acids 
35 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

Cys Asn Cys Asp Gly Tyr Thr 
1 5 7 

40 (2) INFORMATION FOR SEQ ID N0:12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Trp Ala Ser Gly Asp Gly Gly 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 13: 

5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY; Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

10 Cys Asn Cys Asp Gly Tyr Ala 
1 5 7 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i> SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
15 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Val He Asp Ser Gly He 
1 5 6 

20 (2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 
IB) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Asp Asn Asn Ser His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 16: 

U) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 6 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

He Val Asp Asp Gly Leu 
35 1 5 6 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
40 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Ser Asp Asp Tyr His 
1 5 



-39 



Printed from Mimosa 02/15/2000 



WO 96/27671 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 6 amino acids 

(B) TYPE: Amino Acid 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

He Leu Asp Asp Gly He 
1 ^5 6 

(2) INFORMATION FOR SEQ ID NO: 19: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi} SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

15 Asn Asp Asn Arg His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

He Met Asp Asp Gly He 
1 5 6 

25 (2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

Trp Phe Asn Ser His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 27 base pairs 

(B) TYPE: Nucleic Acid 
(C> STRANDEDNESS: Single 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 



40 GCGGTTATCG ACGACGGTAT CGATTCT 27 

(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 27 base pairs 
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(B) TYPE: Nucleic Acid 
<C) STRANDEDNESS: Single 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 



5 GCGGTTATCG ACAAAGGTAT CGATTCT 27 
(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: Nucleic Acid 
10 (C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 



GCGGTTATCG ACGAAGGTAT CGATTCT 27 
(2) INFORMATION FOR SEQ ID NO:25: 

15 <i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

( D) TOPOLOGY: Linear 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 



CCAAGACAAC GACTCTCACG GAA 23 

(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 23 base pairs 
25 (B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 



CCAAGACAAC AGCTCTCACG GAA 23 
30 (2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
35 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 



CCAAGACAAC AAATCTCACG GAA 23 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 42 base pairs 
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(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY : Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2B: 



S CACTTCCGGC AGCTCGTCGA CAGTGGACTA CCCTGGCAAA TA 42 
(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: Nucleic Acid 
10 (CJ STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 



CACTTCCGGC AGCTCGTCGA CAGTGGAGTA CCCTGGCAAA TA 42 
(2) INFORMATION FOR SEQ ID NO: 30: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 41 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

20 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



TTAACATGAG CCTCGGCCCA GCTAGCGGTT CTGCTGCTTT A 41 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 43 base pairs 
25 (B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 



TTAACATGAG CCTCGGCCCC GCGGATGATT CTGCTGCTTT AAA 4 3 
30 (2) INFORMATION FOR SEQ ID NO:32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
35 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



CGGCAGCTCA AGCAACGATG GCTATCCTGG CAAATACCCT TCTGTCA 47 
(2) INFORMATION FOR SEQ ID NO:33: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH : 44 base pairs 
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(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS : Single 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 



5 ACTTCCGGCA GCTCTTCGAA CTACGACGGG TACCCTGGCA AATA 44 
(2) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHJ&ACTERISTICS : 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
10 (D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Asn Leu Thr Ala Arg 
1 5 

(2) INFORMATION FOR SEQ ID NO: 35: 

15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

20 Asn Leu Met Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
25 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Thr Ala Ser Arg Arg 
1 5 

30 (2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 

Leu Thr Arg Arg Ser 
1 5 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 
40 (A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
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Ala Leu Ser Arg Lys 
1 5 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i> SEQUENCE CHARACTERISTICS: 
S (A) LENGTH : 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 

Leu Met Leu Arg Lys 
10 1 5 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 5 amino acids 
<B) TYPE: Amino Acid 
15 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 

Ala Ser Thr His Phe 

1 5 

(2) INFORMATION FOR SEQ ID NO: 41: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 

25 Gin Lys Pro Asn Phe 
1 5 

(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 

Arg Lys Pro Thr His 
1 5 

35 (2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

40 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

He Gin Gin Gin Tyr 
1 5 

(2) INFORMATION FOR SEQ ID NO:44: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 
<B) TYPE: Amino Acid 

(D) TOPOLOGY : Linear 

5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 

Arg Pro Gly Ala Met 

1 5 

(2) INFORMATION FOR^ SEQ ID NO: 45: 

U) SEQUENCE CHARACTERISTICS: 
10 (A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 

Gin Gly Glu Leu Pro 
15 1 5 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS : 
<A) LENGTH: 5 amino acids 
(B) TYPE: Amino Acid 
20 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

Ala Pro Asp Pro Thr 
1 5 

(2) INFORMATION FOR SEQ ID NO:47: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY : Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

30 Gin Leu Leu Glu His 
1 5 

(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 5 amino acids 
35 (B> TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

Val Asn Asn Asn His 

1 5 

40 (2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 

Ala Gin Ser Asn Leu 
1 5 

(2) INFORMATION FOR SEQ ID NO: 50: 

5 U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY; Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 

X0 Thr Ala Ser Arg Arg 
1 5 

(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 6 amino acids 
15 {B} TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: 

His His His His His His 
1 5 6 

20 <2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

25 <xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 

Leu Met Arg Lys 
1 4 

(2) INFORMATION FOR SEQ ID NO:53: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
( D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 

Leu Thr Ala Arg 
35 1 4 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
40 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: 

Gly Pro Gly Gly 
1 4 
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(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
5 <D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: 

Gly Leu Met Arg Lys 

i 5 

(2) INFORMATION FOR SEQ ID NO: 56: 

10 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 
(8) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

15 Ala Ala Pro Phe 
1 < 

(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 13 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 57: 

Gly Pro Gly Gly Xaa Xaa Xaa Xaa Xaa Gly Gly Pro Gly 
1 5 10 13 

25 (2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

30 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 

Ala Ala Pro Lys 
1 4 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS : 
35 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 

Ala Ala Pro Arg 
40 1 4 

(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
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(D) TOPOLOGY: Linear 
(Xi> SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
Ala Ala Pro Met 

a 4 

5 (2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

L0 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

Ala Ala Pro Gin 
1 4 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 
15 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 

Ala Ala Lys Phe 
20 1 * 

(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
25 (D) TOPOLOGY: Linear 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: 

Ala Ala Ala Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 64: 

30 U> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

35 Ala Ala Arg Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
40 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 
Ala Ala Asp Phe 
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1 4 

(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
5 (B) TYPE: Amino Acid 

(DJ TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 

Ala Ala Lys Lys 
1 4 

10 (2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

IS (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

Ala Ala Lys Arg 
1 4 

(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 
20 (A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

Ala Ala Lys Phe 
25 1 4 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: Amino Acid 
30 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

Ala Ala Pro Xaa 
1 4 

(2) INFORMATION FOR SEQ ID NO: 70: 

35 U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 4 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:70: 

40 Ala Ala Xaa Phe 
1 4 

(2) INFORMATION FOR SEQ ID NO: 71: 

U! SEQUENCE CHARACTERISTICS: 
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(A) LENGTH : 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 

5 AXa Ala Xaa Xaa Xaa 

1 5 

(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 275 amino acids 
10 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 

Ala Gin Ser Val Pro Tyr Gly Val Ser Gin He Lys Ala Pro Ala 
15 10 15 

15 Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val Ala Val. 

20 25 30 

He Asp Ser Gly He Asp Ser Ser His Pro Asp Leu Lys Val Ala 
35 40 45 

Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp 
50 55 60 

Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala Leu 
€5 70 75 

Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser Ala Ser Leu 
80 65 90 

25 Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin Tyr Ser 
95 100 105 

Trp He He Asn Gly He Glu Trp Ala He Ala Asn Asn Met Asp 
110 115 120 

Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala Ala Leu 
30 125 130 135 

Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val Val Val 
140 145 150 

Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 160 165 

35 Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly Ala Val 
170 175 180 

Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly Pro Glu 
185 190 195 

Leu Asp Val Met Ala Pro Gly Val Ser He Gin Ser Thr Leu Pro 
40 200 205 210 

Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala Ser Pro 
215 220 225 

His Val Ala Gly Ala Ala Ala Leu He Leu Ser Lys Kis Pro Asn 
230 235 240 

-50- 



Printed from Mimosa 02/15/2000 



W096/27671 PCIYUS96/02WI 

Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr Thr Thr 
f 245 250 255 

Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He Asn Val 
260 265 270 

5 Gin Ala Ala Ala Gin 
275 

(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS; 
(A) LENGTH: 4 amino acids 
10 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 

Arg Val Arg Arg 
1 4 

15 (2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1146 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
20 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: 

GTG AGA GGC AAA AAA GTA TGG ATC AGT TTG CTG TTT 36 
Val Arg Gly Lys Lys Val Trp He Ser Leu Leu Phe 
•107 -105 -100 

25 GCT TTA GCG TTA ATC TTT ACG ATG GCG TTC GGC AGC ACA 75 
Ala Leu Ala Leu He Phe Thr Met Ala Phe Gly Ser Thr 
-95 -90 -85 

TCC TCT GCC CAG GCG GCA GGG AAA TCA AAC GGG GAA AAG 114 
Ser Ser Ala Gin Ala Ala Gly Lys Ser Asn Gly Glu Lys 
30 -80 -75 -70 

AAA TAT ATT GTC GGG TTT AAA CAG ACA ATG AGC ACG ATG 153 
Lys Tyr He Val Gly Phe Lys Gin Thr Met Ser Thr Met 
-65 -60 

AGC GCC GCT AAG AAG AAA GAT GTC ATT TCT GAA AAA GGC 192 
35 Ser Ala Ala Lys Lys Lys Asp Val He Ser Glu Lys Gly 
-55 -50 -45 

GGG AAA GTG CAA AAG CAA TTC AAA TAT GTA GAC GCA GCT 231 
Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp Ala Ala 
-40 -35 

40 TCA GCT ACA TTA AAC GAA AAA GCT GTA AAA GAA TTG AAA 270 
Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

AAA GAC CCG AGC GTC GCT TAC GTT GAA GAA GAT CAC GTA 309 
Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val 
45 -15 -10 -5 

AGA CAT AAG CGC GCG CAG TCC GTG CCT TAC GGC GTA TCA 348 
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Arg His Lys Arg Ala Gin Ser Val Pro Tyr Gly Val Ser 
1 5 

CAA ATT AAA GCC CCT GCT CTG CAC TCT CAA GGC TAC ACT 387 
Gin lie Lys Ala Pro Ala Leu His Ser Gin Gly Tyr Thr 
5 10 15 20 

GGA TCA AAT GTT AAA GTA GCG GTT ATC GAC AGC GGT ATC 426 
Gly Ser Asn Val Lys Val Ala Val He Asp Ser Gly He 
25 30 35 

GAT TCT TCT CAT CCT GAT TTA AAG GTA GCA GGC GGA GCC 465 
10 Asp Ser Ser His Pro Asp Leu Lys Val Ala Gly Gly Ala 
40 45 

AGC ATG GTT CCT TCT GAA ACA AAT CCT TTC CAA GAC AAC 504 
Ser Met Val Pro Ser Glu Thr Asn Pro Phe Gin Asp Asn 
50 55 60 

15 GAC TCT CAC GGA ACT CAC GTT GCC GGC ACA GTT GCG GCT 543 
Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala Ala 
65 70 

CTT AAT AAC TCA ATC GGT GTA TTA GGC GTT GCG CCA AGC 582 
Leu Asn Asn Ser He Gly Val Leu Gly Val Ala Pro Ser 
20 75 80 85 

GCA TCA CTT TAC GCT GTA AAA GTT CTC GGT GCT GAC GGT 621 
Ala Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly 
90 95 100 

TCC GGC CAA GAT AGC TGG ATC ATT AAC GGA ATC GAG TGG 660 
25 Ser Gly Gin Asp Ser Trp He He Asn Gly He Glu Trp 
105 HO 

GCG ATC GCA AAC AAT ATG GAC GTT ATT AAC ATG AGC CTC 699 
Ala He Ala Asn Asn Met Asp Val He Asn Met Ser Leu 
115 120 125 

30 GGC GGA CCT TCT GGT TCT GCT GCT TTA AAA GCG GCA GTT 738 
Gly Gly Pro Ser Gly Ser Ala Ala Leu Lys Ala Ala Val 
130 135 

GAT AAA GCC GTT GCA TCC GGC GTC GTA GTC GTT GCG GCA 777 
Asp Lys Ala Val Ala Ser Gly Val Val Val Val Ala Ala 
35 140 145 150 

GCC GGT AAC GAA GGC ACT TCC GGC AGC TCG TCG ACA GTG 816 
Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser Thr Val 
155 160 165 

GAC TAC CCT GGC AAA TAC CCT TCT GTC ATT GCA GTA GGC 855 
40 Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 
170 175 

GCT GTT GAC AGC AGC AAC CAA AGA GCA TCT TTC TCA AGC 894 
Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser 
180 185 190 

45 GTA GGA CCT GAG CTT GAT GTC ATG GCA CCT GGC GTA TCT 933 
Val Gly Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser 
195 200 

ATC CAA AGC ACG CTT CCT GGA AAC AAA TAC GGG GCG TAC 972 
He Gin Ser Thr Leu Pro Gly Asn Lys Tyr Gly Ala Tyr 
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205 210 215 

AAC GGT ACC TCA ATG GCA TCT CCG CAC GTT GCC GGA GCG 1011 
Asn Gly Thr Ser Met Ala Ser Pro His Val Ala Gly Ala 
220 225 230 

5 GCT GCT TTG ATT CTT TCT AAG CAC CCG AAC TGG ACA AAC 1050 
Ala Ala Leu lie Leu Ser Lys His Pro Asn Trp Thr Asn 
235 240 

ACT CAA GTC CGC AGC AGT TTA GAA AAC ACC ACT ACA AAA 1089 
Thr Gin Val Arg Ser Ser Leu Glu Asn Thr Thr Thr Lys 
10 245 250 255 

CTT GGT GAT TCT TTC TAC TAT GGA AAA GGG CTG ATC AAC 1128 
Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He Asn 
260 265 

GTA CAG GCG GCA GCT CAG 1146 
15 Val Gin Ala Ala Ala Gin 
270 275 

(2) INFORMATION FOR SEQ ID NO:75: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 382 amino acids 
20 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

Val Arg Gly Lys Lys Val Trp He Ser Leu Leu Phe Ala Leu Ala 
-107 -105 -100 -95 

25 Leu He Phe Thr Met Ala Phe Gly Ser Thr Ser Ser Ala Gin Ala 
-90 -85 -80 

Ala Gly Lys Ser Asn Gly Glu Lys Lys Tyr He Val Gly Phe Lys 
-75 -70 -65 

Gin Thr Met Ser Thr Met Ser Ala Ala Lys Lys Lys Asp Val He 
30 -60 -55 "50 

Ser Glu Lys Gly Gly Lys Val Gin Lys Gin Phe Lys Tyr Val Asp 
-45 -40 -35 

Ala Ala Ser Ala Thr Leu Asn Glu Lys Ala Val Lys Glu Leu Lys 
-30 -25 -20 

35 Lys Asp Pro Ser Val Ala Tyr Val Glu Glu Asp His Val Arg His 
-15 -10 " 5 

Lys Arg Ala Gin Ser Val Pro Tyr Gly Val Ser Gin He Lys Ala 
1 5 10 

Pro Ala Leu His Ser Gin Gly Tyr Thr Gly Ser Asn Val Lys Val 
40 15 20 25 

Ala Val He Asp Ser Gly He Asp Ser Ser His Pro Asp Leu Lys 
30 35 40 

val Ala Gly Gly Ala Ser Met Val Pro Ser Glu Thr Asn Pro Phe 
45 50 55 

45 Gin Asp Asn Asp Ser His Gly Thr His Val Ala Gly Thr Val Ala 
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60 



65 



70 



Ala Leu Asn Asn Ser lie Gly Val Leu Gly Val Ala Pro Ser Ala 

75 80 85 

Ser Leu Tyr Ala Val Lys Val Leu Gly Ala Asp Gly Ser Gly Gin 

5 90 95 100 

Asp Ser Trp lie lie Asn Gly He Glu Trp Ala He Ala Asn Asn 

105 110 115 

Met Asp Val He Asn Met Ser Leu Gly Gly Pro Ser Gly Ser Ala 

120 125 130 

10 Ala Leu Lys Ala Ala Val Asp Lys Ala Val Ala Ser Gly Val Val 

135 140 145 

Val Val Ala Ala Ala Gly Asn Glu Gly Thr Ser Gly Ser Ser Ser 

150 155 160 

Thr Val Asp Tyr Pro Gly Lys Tyr Pro Ser Val He Ala Val Gly 

15 165 HO 175 

Ala Val Asp Ser Ser Asn Gin Arg Ala Ser Phe Ser Ser Val Gly 

180 185 190 

Pro Glu Leu Asp Val Met Ala Pro Gly Val Ser He Gin Ser Thr 

195 200 205 

20 Leu Pro Gly Asn Lys Tyr Gly Ala Tyr Asn Gly Thr Ser Met Ala 

210 215 220 

Ser Pro His Val Ala Gly Ala Ala Ala Leu He Leu Ser Lys His 

225 230 235 

Pro Asn Trp Thr Asn Thr Gin Val Arg Ser Ser Leu Glu Asn Thr 

25 240 245 250 

Thr Thr Lys Leu Gly Asp Ser Phe Tyr Tyr Gly Lys Gly Leu He 

255 260 265 

Asn Val Gin Ala Ala Ala Gin 



270 



275 



30 



(2) INFORMATION FOR SEQ ID NO: 76: 



U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY : Linear 



35 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: 




(2) INFORMATION FOR SEQ ID NO: 77: 



40 



U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
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Gly Ser Gly Gin Tyr Ser Trp lie He Aan Gly 
1 5 10 11 

(2) INFORMATION FOR SEQ ID NO:78: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: 

Gly Asp He Thr Thr Glu Asp Glu Ala Ala Ser 
10 1 5 10 11 

(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
15 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: 

Gly Glu Val Thr Asp Ala Val Glu Ala Arg Ser 
1 5 10 11 

(2) INFORMATION FOR SEQ ID NO: 60: 

20 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 amino acids 

(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: BO: 

25 Pro Phe Met Thr Asp He He Glu Ala Ser Ser 
1 5 10 11 

(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 11 amino acids 
30 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 81: 

Gly He Val Thr Asp Ala He Glu Ala Ser Ser 
I 5 10 11 

35 (2) INFORMATION FOR SEQ ID NO: 82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 
40 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 



GGTTCCGGCC AAGATAGCTG GATCATT 27 
(2) INFORMATION FOR SEQ ID NO: 83: 
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(i) SEQUENCE CHARACTERISTICS: 
{A) LENGTH: 29 base pairs 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS : Single 
5 (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 



CCAATACAGC TGGGAAATTA ACGGAATCG 29 

(2) INFORMATION FOR SEQ ID NO: 84: 

(i) SEQUENCE CHARACTERISTICS: 
10 <A) LENGTH: 31 base pairs 

(B) TYPE: Nucleic Acid 
<C) STRANDEDNESS: Single 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:84: 



15 GGTTCCGGCC AAGATAGCTG GGAAATTAAC G 31 
{2} INFORMATION FOR SEQ ID NO: 65: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: Nucleic Acid 
20 (C) STRANDEDNESS: Single 

(D) TOPOLOGY : Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 85: 



AAGAAGATCA CGTAAGACAT AAGCGCGCGC 30 

(2) INFORMATION FOR SEQ ID NO: 86: 

25 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 
<B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 

30 Arg Ala Lys Arg 
1 4 

(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 4 amino acids 
35 (B) TYPE: Amino Acid 

(D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 

Lys Ala Lys Arg 
1 4 

40 (2) INFORMATION FOR SEQ ID NO: 88: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 8 amino acids 
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(B) TYPE: Amino Acid 
(D) TOPOLOGY: Linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 88: 

Gly Pro Gly Gly Leu Met Arg Lys 
5 1 5 8 

(2) INFORMATION FOR SEQ ID NO: 89: 

U) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 3 amino acids 
<B) TYPE: Amino Acid 
IP (D) TOPOLOGY: Linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 

Gly Pro Gly Gly Lys Ala Lys Arg 
1 5 8 
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What is claimed is: 

1. A subtilisin variant derived from a precursor subtilisin-type serine protease said variant 
capable of cleaving a polypeptide substrate comprising the sequence: 

5 oh 
I I 

F4-P3-P2-P1-C-N-P1' 



P4 is a basic amino acid; 
10 P3 is any amino acid selected from the naturally occurring amino acids; 

P2 is a basic amino acid; 
PI is a basic amino acid; and 
Pris not Pro. 

2. The subtilisin variant of claim I containing an acidic amino acid at a residue equivalent to 
X 5 Asn 62, Tyr 1 04 and Gly 1 66 of the subtilisin naturally produced by Bacillus amyioliqutfaciens. 

3. The subtilisin-type serine protease variant of claim 2 wherein the acidic amino acid is Asp 
orGlu. 

4. The subtilisin-typc serine protease variant of claim 3 wherein the acidic amino acid is Asp. 

5. The subtilisin-typc serine protease variant of claim 2 wherein the precursor subtilisin-typc 
2 0 serine protease in the subtilisin naturally produced by Bacillus amyloliqvefaciens. 

6. The subtilisin variant of claim 5 having the amino acid sequence of the mature polypeptide 
ofFigure8(SEQlDNO: 75). 

7. A subtilisin variant having substrate specificity for peptide substrates containing dibasic 
amino acid sequences. 

25 S. The subtilisin variant of claim 7 having a different amino acid residue at residue position +62 

than subtilisin naturally produced by Bacillus amyloliquefaciens. 

9. The subtilisin variant of Claim 8 having an Asp or Glu at residue position 462. 

10. The subtilisin variant of Claim 9 having an Asp at residue position +62. 

11. The subtilisin variant of Claim 10 further having an Asp or Glu at residue position +166. 
30 12. The subtilisin variant of Claim 1 1 having an Asp at residue position +166. 

1 3. The subtilisin variant of Claim 12 having the amino acid sequence of the mature polypeptide 
provided in Fig. 6. 

14. An isolated nucleic acid molecule encoding the subtilisin variant of Claim 1 . 

15. The nucleic acid molecule of Claim 14 further comprising a promoter operably linked to the 
35 nucleic acid molecule. 

16. An expression vector comprising the nucleic acid molecule of Claim 1 5 operably linked to 
control sequences recognized by a host cell transformed with the vector. 

1 7. A host cell transformed with the vector of Claim 1 6. 

1 8. An isolated nucleic acid molecule encoding the subtilisin variant of Claim 7. 
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19. Hie nucleic acid molecule of Claim 18 further comprising a promoter operably linked to the 
nucleic acid molecule. 

20. An expression vector comprising the nucleic acid molecule of Claim 19 operably linked to 
control sequences recognized by a host cell transformed with the vector. 

21. A host cell transformed with the vector of Claim 20. 

22. A process of using the nucleic acid molecule encoding the subtilisin variant to effect 
production of the subtilisin variant comprising cuhuring the host cell of Claim 21 under conditions suitable 
for expression of the subtilisin variant 

23. The process of Claim 22 further comprising recovering the subtilisin variant from the host 
cell culture medium. 

24. A method of using the subtilisin variant of Claim 1 comprising contacting a fusion protein 
containing a dibasic sequence with the subtilisin variant. 

25. A process for cleaving a polypeptide, said polypeptide comprising an amino acid sequence 
represented by the formula: 

P4-P3-P2-PNP1* 
wherein, 

P4 is a basic amino acid; 

P3 is an amino acid selected from the naturally occurring amino acids; 
P2 is a basic ammo acid; 
PI is a basic amino acid; and 
PI' is not Pro; 
comprising the step of: 

subjecting said polypeptide to the subrJlisn variant of claim 1 in a reaction mixture under conditions 
such that the subttlisn variant cleaves the polypeptide. 

26. A process of using the nucleic acid molecule encoding the subtilisin variant to effect 
production of the subtilisin variant comprising cu taring the host cell of Claim 17 under conditions suitable 
for expression of the subtilisin variant 

27. The process of Claim 26 further comprising recovering the subtilisin variant from the host 
cell culture medium. 

28. A method of using the subtilisin variant of Claim 7 comprising contacting a fusion protein 
containing a dibasic sequence with the subtilisin variant. 

29. A process for cleaving a polypeptide, said polypeptide comprising an amino acid sequence 
represented by the formula: 

P4-P3-P2-PI-PI* 
wherein, 

P4 is a large hydrophobic amino acid; 

P3 is an amino acid selected from the naturally occurring amino acids; 
P2 is a basic amino acid; 
PI is a basic amino acid; and 
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pr is not Pro; 
comprising the step of: 

subjecting said polypeptide to the subtilisn variant of claim 7 in a reaction mixture under conditions 
such that the subtiiisn variant cleaves the polypeptide. 



-60- 



Printed from Mimosa 02/15/2000 




Printed from Mimosa 02/15/2000 



WO 96/27671 



2/17 



PCT/US96/02S61 




Figure 2 



Printed from Mimosa 02/15/2000 



WOW/27671 



PCTA)S96/D2M1 



3/17 




WT S33D N62D S33D/N62D 
Subtilisin Mutant 



Figure 3 



Printed from Mimosa 02/15/2000 



WO 96/27671 



4/17 



PCT/CS96/0I861 



7 
6 
5 

u> 3 
2 
1 
0 



iii Ml 

li HI 

|; HP 



WT 



El 



3ff 
it 




PI t P2 Residues (Xu^Xu,) 
for Sucdnyl-AA(Xia 2 X*i | )- 
pN A Substrate: 



■ 


KK 


■ 


KR 


B 


KF 


B 


PK 


B 


PF 


□ 


AF 



N62D/G166E N62D/G166D 
Subtilisin Mutant 



Figure 4 



Printed from Mimosa 02/15/2000 



WO 96/27671 



5/17 



FCT/US96/02861 




Time (min.) 

FIGURE 5 



Printed from Mimosa 02/15/2000 



WO 967276-71 



6/17 



PCT/US96702861 



M 

m 

9* 


M 
O 
M 


* 

M 




ftJ 
« 


• 


O 


m 


sss 


HI 


C 4 > 


rs3 


n »• •* 

583 


38 


m 




III 




HI 


< > n 0 0* 
• 0 n* 
© ». 0 0 ~ 




X O O 

383 


< n o 


M > H 

S§8 


m 


r » 
• no 




138 


• on 


• on 


SI3 




r *• > 


rsg 


0 n 0 


he 


.38 

x 0 n a 


IB 


m 


«?38 

Of > H 


HI 


r -f » 




(111 


*38 


S8I 


.82 

•"• A rf 




?3E 




sir 




HE 




HE 


£S3 

^ •< > 


133 


srssr 


•ss 


.338 






x n 0 




r seS 


•35 1 


F83 


338 


HE 

• on 


m t- 5 H 

g ■ 0 rt 




$83. 


P H > 


.38 


• •* * 


£31 




588. 




r 2 5 


• 4 > 


£38 

** O n 




S nS 


* * 4 


• > 


• •* f 


?38* 

x 3 n a » 


sag 


0 ■ A 0 


r CJ2 

8 S3 


rig 

• Oft 


• «i 0 n 
0 * •# 


£88 


3 88 


• 35 


rat 
. it 


3 2 4 




?3£ 


3 S3 


5 28 


• •» 4 
••no 


in 


•» n 0 

rIS 


hi}! 


« * . 




S23 


im 




. >•* 




rat? 

. ass 


res 
SS3 


* 9fi 
rsS 


.31 


< n 0 


to A A 

;3§ 


las- 


ESS" 


311 


SSI 


m 


131, 
H3-r 


III 


ses 






&eis 


m 


IVk 




£88 




ni 


he 


f3l = 


im 


m 




*£§§ 


m 


m 


338 £ 


m 


in 


m 


*8I 


111 


mi 


m 


m 


in 


m 


m 


HI! 

•sr 


HI" 


* • S * 


m 


m 




m 


hi 


?53 

• on 


m 


m 


•» > «4 


HI.. 


m% 


£88 


=35 

• on 


*o 0 n 
•ion 
0 > •* 




Hit* 


ss§25 


iss - 


•» h > 
• no 

•ion 




M > *t 

* S3 


m 


•5Sl 


HI 


mi 
m~ 


•f •« ► 


£§3 


?3S 


m 


? 3 OS 


388 


883 



IMi 

83 - 

83" 

>» Hf 

23" 

H 



II. 

IIS 



if 



• 

t P 



Figure 6-1 



Printed from Mimosa 02/15/2000 



WO**7«71 PCT/US96/0JS61 

7/17 



M 
»* 

U» 
W 


o 
•J 

M 


» 
•* 


w 

M 

e 


p> 
* 




•» 
p> 


n >■ 


•ion 


o * 4 > 


• on 

i n o 


f3l 


9 o n 


r 53 




• 35 

• on 


pi * > 

588 




?88 




o n o 

£88 


m 


C O p • • 0- sr »r 
i- o o o m 


253 

>-•«>• TJ 


ESI 


m 




EE3 


m 


h n h 




18* 


m 


£35 

coo 


> n o 

T88 


m 


P> > * 

585 


c n o •< 


S3?! 


im 


t n o 


■con 

S35 


ei •* > 

s 53 


^5 

f* n o 


IBS 




1 c 


PI H > 

588 


seta 

O C H 9* 


> Q O 

STS6 


FS3" 


*§8 


m 


m *ff >• m m 

•«-> H V 

• on v 


> n o 

ST35 


?3s 




125 


H3I 






Pi > 4 

535 


n fa ra 


* o n 

3 88 


< n o 
too 1 


Hon 


» n et 




s|E3 


ton 


• on 


f* n o v ^ 


sB5 

v p* 




9 •* 


^> 




ESS 


gr§8" 2 

o • H > n 


*i > *t 


*% C3 f* • *» 








• o n 


0 > H 


Ei3 

ho ft 




i!3i 


253 

•• 9* 






m a n 


> H 

• > 4 


N» «f > « 


222 

+~ > 


f 3E 


res 


ti 9* n o « 
w o o 


?S3 


• * ► 


?P3 

wort 


m !f 5 
• > 


•S3 

w > H 


M 9> * 


• S3 


S S3 


?C3 


S5S3 


•31 

• on 


m 


•as 


• S3 


• on 


£88? 


S28 
• no 


f» n o 




2* S 3 


8 S3 


p 3 5 


• S3 


£88 
• 3 > 


non 


• » «i 




• 53 


5 88 


o > A g 


S8S2, 


e o rt 


f 35 

•o 3 o 


■•on 


F23 

e * 4 


n n o 


|ge= 

o n v 


sg§s 






* 9- * 


n > «f 

sea 


FS3 






m 


5 8§o • 




S3I 


s?$3 


sair 




m 






ssc 

■ an 




n„ 


lig 


+~ * > •* 
SS53 








he 


HS 




m 


5 § 8 •» S 




© S 3 n 


131 


13B 


•» > rt 


m 


SSI 




r§8 




m 


«r 


*3S • 


Sac- 




5$3 


m 


m. 




3283 

O +* H > 


ra?, 


HI 


H3 




mi 





Figure 6-2 



Printed from Mimosa 02/15/2000 



WO 96/27671 



8/17 



PCT/US96/0M61 



So *vt o 



31- ~ 




> n n 



if 



lis 



'4 i 



III 

o 

83" 



55 



O « i 

!! 

S3«r 



Figure 6-3 



Printed from Mimosa 02/15/2000 



WO 96727671 



PCT/US9O02861 

9/17 




Figure 6-4 



Printed from Mimosa 02/15/2000 



WO 96/27671 



10/17 



PCT/US96/02861 




Figure 6-5 



Printed from Mimosa 02/15/2000 



WO 96/27671 



11/17 



PCIYUS96/0286I 




Figure 6-6 



Printed from Mimosa 02/15/2000 



WO 96/27671 



12/17 



PCI7US96/02861 



l>BP 

ID** 



» M X M 



si > 




Figure 6-7 



Printed from Mimosa 02/15/2000 



WO 9607671 



13/17 



PCT/US96/02M1 




55 3 



Hi 



iff r» 
■ i i • « 

1* — *- 



P if 




Figure 6-8 



Printed from Mimosa 



02/15/2000 




Figure 6-9 



Printed from Mimosa 



02/15/2000 



WO 96/27671 



15/17 



PCI7US96/02861 



i! 



o a er ■ 

h > 5 S 



as 



1 

is 



S3" 



152 



If > 



1! 



hi 

52 



P 



i •< » ■ 



> 1 



♦4 > s- • 

rip J f 

If 



S3 



2 2 



Figure 6-10 



Printed from Mimosa 02/15/2000 



WO 96/27671 



16/17 



PCT/US96/0M61 




Printed from Mimosa 02/15/2000 



wo 96mm 



17/17 



PCT/US96/02861 



Val 


Arg Gly 


Lys 


Lys 


- XU / 




-105 






Leu 


He 


Phe 


Thr 


Met 






-90 






Ala 


Gly 


Lys 


O A mm 

ser 


Asn 






-75 






Gin 


Thr 


Met 


Ser 


Thr 






-60 






ser 


Glu 


Lys 


Gly 


Gly 






-45 






Aia 


Ala 


Ser 


Aia 


Thr 






-30 






Lys 


Asp 


Pro 


Cat* 

ser 


vai 




-15 






iiys 


Arg 


Ala 
1 


uin 


ber 


pro 


Ala 


Leu 


HIS 


ser 




15 








Ala 


Val 


He 


ASp 


ser 




30 






Val 


Ala 


Gly 


Gly 


Ala 




45 




Gin 


Asp 


Asn 


Asp 


Ser 




60 








Ala 


Leu 


Asn 


Asn 


Ser 




75 








Ser 


Leu 


Tyr 


Ala 


Val 




90 






Asp 


Ser 


Trp 


lie 


lie 




105 








Met 


Asp 


Val 


He 


Asn 




120 








Ala 


Leu 


Lys 


Ala 


Ala 




135 






Val 


Val 


Ala 


Ala 


Ala 




150 








Thr 


Val 


Asp 


Tyr 


Pro 




165 




Ala 


Val 


Asp 


Ser 


Ser 




180 






Pro 


Glu 


Leu 


Asp 


Val 




195 






Leu 


Pro 


Gly 


Asn 


Lys 




210 








Ser 


Pro 


His 


Val 


Ala 




225 








Pro 


Asn 


Trp 


Thr 


Asn 




240 






Thr 


Thr 


Lys 


Leu 


Gly 




255 






Asn 


Val 


Gin 


Ala 


Ala 



270 



FIGURE 6 

Val Trp He Ser Leu 
-100 

Ala Phe Gly Ser Thr 
-85 

Gly Glu Lys Lys Tyr 

-70 

Met Ser Ala Ala Lys 
-55 

Lys Val Gin Lys Gin 
-40 

Leu Asn Glu Lys Ala 
-25 

Ala Tyr Val Glu Glu 
-10 

Val Pro Tyr Gly Val 
5 

Gin Gly Tyr Thr Gly 
20 

Gly He Asp Ser Ser 
35 

Ser Met Val Pro Ser 
50 

His Gly Thr His Val 
65 

He Gly Val Leu Gly 
80 

Lys Val Leu Gly Ala 
95 

Asn Gly lie Glu Trp 
HO 

Met Ser Leu Gly Gly 
125 

Val Asp Lys Ala Val 
140 

Gly Asn Glu Gly Thr 
155 

Gly Lys Tyr Pro Ser 

170 

Asn Gin Arg Ala Ser 
185 

Met Ala Pro Gly Val 
200 

Tyr Gly Ala Tyr Asn 
215 

Gly Ala Ala Ala Leu 
230 

Thr Gin Val Arg Ser 
245 

Asp Ser Phe Tyr Tyr 

260 
Ala Gin 

275 



Leu Phe Ala Leu Ala 
-95 

Ser Ser Ala Gin Ala 
-80 

He Val Gly Phe Lys 
-65 

Lys Lys Asp Val He 
-50 

Phe Lys Tyr Val Asp 

-35 

Val Lys Glu Leu Lys 
-20 

Asp His Val Arg His 

-5 

Ser Gin He Lys Ala 
10 

Ser Asn Val Lys Val 
25 

HiB Pro Asp Leu Lys 
40 

Glu Thr Asn Pro Phe 
55 

Ala Gly Thr Val Ala 
70 

Val Ala Pro Ser Ala 
85 

Asp Gly Ser Gly Gin 
100 

Ala He Ala Asn Asn 
115 

Pro Ser Gly Ser Ala 
130 

Ala Ser Gly Val Val 
145 

Ser Gly Ser Ser Ser 
160 

Val He Ala Val Gly 
175 

Phe Ser Ser Val Gly 
190 

Ser lie Gin Ser Thr 
205 

Gly Thr Ser Met Ala 
220 

He Leu Ser Lys His 
235 

Ser Leu Glu Asn Thr 
250 

Gly Lys Gly Leu He 
265 



Printed from Mimosa 02/15/2000 



INTERNATIONAL SEARCH REPORT 



InKr^flMl Applicant No 

PI., US 96/02861 



Iftt^W^/^T" ffiBT/54 C12N1/21 C12P21/G6 

,o towwggj Patent Oaaoflcaacn (IPO or to both nrtmsl tlggflggan and IPC . 



B. FIELDS SEARCHED 



IPC 6 CI2N C12P 



(daaifica&on eydam followed by dua fiction lymbob) 



oich dociamnu art included in the Adds searched 



a w the enmnhat 



tearch (name of data but and. wtacrt prececal, 1 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



QufajB ofdocmcm. wit> notation, wbtre eppropn 



, of Ok fctfratfi 



PROTEIN ENGINEERING, 

vol. 4, no. 7, 1991, 

pages 719-737, XPeeEeG8733 

R. S1EZEN ET AL: 'Horology modelling and 

protein engineering strategy of 

subtilases, the familiy of subtilisin-like 

serine proteinases" 

see the whole document 

especially figure 3. table V and page 

729, right column, last paragraph-page 739 

and Discussion 



1-23. 
25-27,29 



1-29 



\m 



1 art listed tn fl* c 



lofboxC 



0 



' Spcaal cafetfonca of owed doewnenta : 

•A* aooa^dtflitti^iaiw^*^^*" 

eonadcicd to be of partcdir relewnee 
"E" earhar document but prtrfuVd on or aflat 9m 



•X' document of particular rdcvanor, tt» d*^, 1 ?*"*?* 
- c nS d ered no^ or cannot be ccoade^*). 



y throw douba on pnonty dar 
'SidTiTotad to establnh the pabbcaooa data of 
otaaoo or other cptoa] reason (aa apcafiad) 



dauafrior 



*O a document refemnf to an oral dtsdotun, me, « 

•p* oonancnt puhbahed pner to the itfamabonal fi 
r than the pnonty data ctanncd 




j Date of the actual cornpleoon of the inunuoond aearch 

18 July 1996 



( NanM and maUutf addma of (he ISA 

European Patent Office, P.B. SIU PatcntUa 

Nt • UK! HV Ripm* 

Td. ( ♦ 11-70) 340-W40. Ta. Jl «1 cpo ol, 
Fatf-HI-TD) MOMU 



Dai<of maUmfOf the 1 

31D7.9B 



Van der Schaal. C 



p«. rcr/UA/ait *«o iwn 



page 1 of 3 



Printed from Mimosa 



02/15/2000 



INTERNATIONAL SEARCH REPORT 



I C(C<nu— tun) DOCUMENTS CONSIDERED TO BE RELEVANT 



I R»l«vml lo d«m So. 



3 V 



PROCEEDINGS OF THE NATIONAL ACADEMY OF 
SCIENCES OF USA, 

vol. 84, August 1987. WASHINGTON US. 

pages 5167-5171, XPOO2O08734 

J. WELLS ET AL: "Recruitment of 

substrate- specificity properties from one 

enzyme into a related one by protein 

engineering" 

see the whole document 

PROCEEDINGS OF THE NATIONAL ACADEMY OF 
SCIENCES OF USA, 

vol. 84, March 1987, WASHINGTON US, 

pages 1219-1223. XPO820O8735 

J. WELLS ET AL: "Designing substrate 

specificity by protein engineering of 

electrostatic interactions' 

see the whole document 

SCIENCE, 

vol. 233, 1986, 

pages 659-633, XP092e08736 

D. ESTELL ET AL: "Probing steric and 

hydrophobic effects on enzyme-substrate 

interactions by protein engineering 

see the whole document 

ANNALS NEW YORK ACADAMY OF SCIENCES. 

vol. 672. 1992, 

pages 71-79. XP602e08737 

T. GRAYCAR ET AL: "Altering the 

proteolytic activity of subtlllsin through 

protein engineering" 

see the whole document 

BIOCHEMISTRY 32 (5). 1993. 1199-1203. 

CODEN: BICHAW ISSn! 0006-2960. XP002008738 
RHEINNECKER M ET AL: "ENGINEERING A NOVEL 
SPECIFICITY IN SUBTILIS1N BPN." 
see the whole document 

JOURNAL OF BIOLOGICAL CHEMISTRY, 
vol. 267, no. 23. 15 August 1992, MD 
pages 16335-16340, XPO02068739 
K. NAKAYAMA ET AL: "Consensus sequence 
for precursor processing at mono-arginyl 
sites" 

see the whole document 

EP.A.O 316 748 (HOECHST AG) 24 May 1989 
see the whole document 

W0.A.91 11454 (UPJOHN CO) 8 August 1991 
see the whole document 

-/- 



I bM) 0*b "*» 



us. 



1-29 



2-6. 
11-13 



2-6, 
11-13 



2-6 



2-6 



1.7,24. 
25,28,29 



page 2 of 3 



Printed from Mimosa 02/15/2000 



INTERNATIONAL SEARCH REPORT 



PC, /US 96/02861 



[ CiC«Mi»>i»on) DOCUMENTS CONSIDERED TO BE RELEVANT 



P.X 



P.X 



Ounonof t 



t ipproprufit, of the relevant p 



P.Y 



W0.A.95 30910 (PROCTER & GAMBLE) 9 
November 1995 
see claim 1 

EP.A.6 405 901-OJNILEVER PLC ;UNI LEVER NV 
(ML)) 2 January 1991 
see claim 1 

BIOCHEMISTRY 34 (41)~~1995. 13312-13319. 
ISSN: 6066-2960, 
17 October 1995, XP662068746 
BALLINGER M 0 ET AL: 'Designing 
subtilisin BPN" to cleave substrates 
containing dibasic residues.' 
see the whole document 



KEYSTONE SYMPOSIUM ON PROCESSING ENZYMES: 
BIOCHEMISTRY, GENETICS AND CLINICAL 
RELEVANCE, LAKE TAHOE. CALIFORNIA. USA, 
MARCH 2-8, 1995. JOURNAL OF CELLULAR 
BIOCHEMISTRY SUPPLEMENT 0 (19B). 1995. 
233. ISSN: 0733-1959. 
4 March 1995. XP002608741 
HELLS J A ET AL: "Subtilisin: An enzyme 
made for engineering." 
see abstract B7-812 



tmm KtlOMt* {mxumutm «aaM M ) (** »•»> 



■Ulrrantudain No. 

1-23.26, 
27 



1-23,26, 
27 



7-13, 

18-24, 

28,29 



1-6, 

14-17, 

25-27 

1-29 



page 3 of 3 



Printed from Mimosa 02/15/2000 



INTERNATIONAL SEARCH REPORT 

jfcmunon on puna family mono** 



No 



PC ./US 96/02861 



Paunt document 
died in search report 



Publication 
dati 



Patent family 
member(i) 



Publication 



EP-A-0316740 



24-05-89 



WO-A-9111454 



08-98-91 



WO-A-9530010 



09-11-95 



DE-A- 
AU-B- 
CA-A- 
DE-A- 
ES-T- 
IE-0- 
JP-A- 
US-A- 



3739347 
2569588 
1337283 
3873273 
2051816 
62220 
1160496 
5270176 



AU-B- 
0E-D- 
DE-T- 
EP-A- 
ES-T- 
JP-T- 



7166891 
69101486 
69101486 
0511978 
2062759 
5503631 



AU-B- 
AU-B- 
W0-A- 
AU-B- 
CA-A- 
EP-A- 
N0-A- 
W0-A- 



2159195 
2293195 
9529979 
7870394 
2170491 
0719339 
961067 
9507991 



EP-A-0405901 



02-01-91 



CA-A- 
W0-A- 
JP-T- 
EP-A- 
WO-A- 
JP-T- 



2034406 
9100334 
4500385 
0405902 
9100335 
4500384 



01-06-89 
29-06-89 

10- 10-95 
03-09-92 
01-07-94 

11- 01-95 
23-06-09 
14-12-93 



21-00-91 
28-04-94 
04-00-94 
11-11-92 

16- 12-94 

17- 06-93 



29-11-95 
29-11-95 
09-11-95 
03-04-95 
23-03-95 
03-07-96 
15-05-96 
23-03-95 



27-12-90 
10-01-91 
23-01-92 
02-01-91 
10-01-91 
23-01-92 



r«. fcr/OA/n* (»•>»« t**> —i w>* '»"» 



Printed from Mimosa 02/15/2000 



THIS PA6I BLANK ppto) 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects in^the images include but are not limited to the items checked: 



1*3 BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 



□ GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCED) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 



IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 





COLOR OR BLACK AND WHITE PHOTOGRAPHS 



THIS PAGE BLANK (USPTO) 



